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1. Preliminaries 



1.1 Optimization Problems 

We first formally define what we mean by an optimization problem. The definition be- 
low focusses on minimization problems. Note that it extends naturally to maximization 
problems. 

Definition 1.1. A minimization problem II is given by a set of instances X. Each instance 
I El specifies 

• a set J 7 of feasible solutions for /; 

• a cost function c : T — > M. 

Given an instance / = (J 7 , c) e I, the goal is to find a feasible solution S € T such that 
c{S) is minimum. We call such a solution an optimal solution of /. 

In discrete (or combinatorial) optimization we concentrate on optimization problems II, 
where for every instance / = {? \c) the set T of feasible solutions is discrete, i.e., T is 
finite or countably infinite. We give some examples below. 

Minimum Spanning Tree Problem (MST): 

Given: An undirected graph G = (V,E) with edge costs c : E — >• M. 
Goal: Find a spanning tree of G of minimum total cost. 

We have 

J = {rc£ | T is a spanning tree of G} and c(T)=^c(e). 

e€T 

Traveling Salesman Problem (TSP): 

Given: An undirected graph G = (V,E) with distances d : E — > M. 
Goal: Find a tour of G of minimum total length. 

Here we have 

J={rC£ | r is a tour of G} and c(7 , )=^J(e) 

e€T 

Linear Programming (LP): 
Given: A set T of feasible solutions x = (x\, . . . ,x n ) defined by m linear constraints 

T= i(xi,...,x„) e R| | Y^aijXi^bj V/' = l,...,m 

and an objective function c(x) = £" =1 c,x,-. 
Goal: Find a feasible solution x e T that minimizes c(x). 
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Note that in this example the number of feasible solution in T is uncountable. So why 
does this problem qualify as a discrete optimization problem? The answer is that T 
defines a feasible set that corresponds to the convex hull of a finite number of vertices. 
It is not hard to see that if we optimize a linear function over a convex hull then there 
always exists an optimal solution that is a vertex. We can thus equivalently formulate the 
problem as finding a vertex x of the convex hull defined by F that minimizes c(x). 

1.2 Algorithms and Efficiency 

Intuitively, an algorithm for an optimization problem II is a sequence of instructions 
specifying a computational procedure that solves every given instance / of II. Formally, 
the computational model underlying all our considerations is the one of a Turing machine 
(which we will not define formally here). 

A main focus of this course is on efficient algorithms. Here, efficiency refers to the overall 
running time of the algorithm. We actually do not care about the actual running time 
(in terms of minutes, seconds, etc.), but rather about the number of basic operations. 
Certainly, there are different ways to represent the overall running time of an algorithm. 
The one that we will use here (and which is widely used in the algorithms community) 
is the so-called worst-case running time. Informally, the worst-case running time of an 
algorithm measures the running time of an algorithm on the worst possible input instance 
(of a given size). 

There are at least two advantages in assessing the algorithm's performance by means 
of its worst-case running time. First, it is usually rather easy to estimate. Second, it 
provides a very strong performance guarantee: The algorithm is guaranteed to compute a 
solution to every instance (of a given size), using no more than the stated number of basic 
operations. On the downside, the worst-case running time of an algorithm might be an 
overly pessimistic estimation of its actual running time. In the latter case, assessing the 
performance of an algorithm by its average case running time or its smoothed running 
time might be suitable alternatives. 

Usually, the running time of an algorithm is expressed as a function of the size of the input 
instance /. Note that a-priori it is not clear what is meant by the size of / because there 
are different ways to represent (or encode) an instance. 

Example 1.1. Many optimization problems have a graph as input. Suppose we are given 
an undirected graph G = (V,E) with n nodes and m edges. One way of representing G is 
by its nxn adjacency matrix A = (a (; -) with ay = 1 if e E anda^ = otherwise. The 
size needed to represent G by its adjacency matrix is thus n 2 . Another way to represent G 
is by its adjacency lists: For every node ; e V, we maintain the set L, C V of nodes that 
are adjacent to ; in a list. Note that each edge occurs on two adjacency lists. The size to 
represent G by adjacency lists is n + 2m. 

The above example illustrates that the size of an instance depends on the underlying data 
structure that is used to represent the instance. Depending on the kind of operations that 
an algorithm uses, one might be more efficient than the other. For example, checking 
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whether a given edge is part of G takes constant time if we use the adjacency matrix, 
while it takes time (or \Lj\) if we use the adjacency lists. On the other hand, listing all 
edges incident to i takes time n if we use the adjacency matrix, while it takes time |L, | if 
we use the adjacency lists. 

Formally, we define the size of an instance / as the number of bits that are needed to store 
all data of / using encoding L on a digital computer and use |L(7) | to refer to this number. 
Note that according to this definition we also would have to account for the number of bits 
needed to store the numbers associated with the instance (like nodes or edges). However, 
most computers nowadays treat all integers in their range, say from to 2 31 , the same and 
allocate a word to each such number. We therefore often take the freedom to rely on a 
more intuitive definition of size by counting the number of objects (like nodes or edges) 
of the instance rather than their total binary length. 

Definition 1.2. Let n be an optimization problem and let L be an encoding of the in- 
stances. Then algorithm ALG solves II in (worst-case) running time / if ALG computes 
for every instance I of size ni = \L(I) | an optimal solution S £ J- using at most /(«/) basic 
operations. 

1.3 Growth of Functions 

We are often interested in the asymptotic running time of the algorithm. The following 
definitions will be useful. 

We define 

3c > 0, n € N such that f(n) < c ■ g(n) Vn > n } 
3c > 0, no £ N such that f(n) > c ■ g(n) Vn > no} 
/(n)GO(g(n)) and /(n)efl(*(n))} 

We will often write, e.g., f{n) — 0(g(n)) instead of /(n) £ 0(g(n)), even though this is 
notationally somewhat imprecise. 

We consider a few examples: We have 10n 2 = 0(n 2 ), jn 2 = £2(n 2 ), lOnlogn = £l(n), 
lOnlogn = 0(n 2 ), 2" +1 = 0(2") and O(\ogm) = O(logn) 1 if m < n° for some constant 
c. 

1.4 Graphs 

An undirected graph G consists of a finite set V (G) of nodes (or vertices) and a finite set 
E(G) of edges. For notational convenience, we will also write G = (V,E) to refer to a 
graph with nodes set V = V(G) and edge set E = E(G). Each edge e £ E is associated 
with an unordered pair (m, v) £ V x V ; u and v are called the endpoints of e. If two edges 
have the same endpoints, then they are called parallel edges. An edge whose endpoints 

'Recall that \og(n c ) = clog(n). 



Definition 1.3. Let j:N4 R+ 

0(g(n)) = {f:E^R + | 
a(g(n)) = {f:N^R + \ 
0(s(n)) = {/:N^K + | 
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are the same is called a loop. A graph that has neither parallel edges nor loops is said to 
be simple. Note that in a simple graph every edge e = (u,v) € E is uniquely identified 
by its endpoints u and v. Unless stated otherwise, we assume that undirected graphs are 
simple. We denote by n and m the number of nodes and edges of G, respectively. A 
complete graph is a graph that contains an edge for every (unordered) pair of nodes. That 
is, a complete graph has m — n(n — l)/2 edges. 

A subgraph H of G is a graph such that V(H) C V and E(H) C E and each e <E E(H) has 
the same endpoints in H as in G. Given a subset V' C V of nodes and a subset E' C £ of 
edges of G, the subgraph H of G induced by V' and £' is defined as the (unique) subgraph 
H of G with V(#) = V and E(H) = E' . Given a subset E' C E, G\E' refers to the 
subgraph H of G that we obtain if we delete all edges in E' from G, i.e., V(H) = V and 
E(H)=E\ E'. Similarly, given a subset 7'CV,G\V refers to the subgraph of G that 
we obtain if we delete all nodes in V' and its incident edges from G, i.e., V(H) = V\ V' 
and E(H) = E\{(u,v) e E | b£ V'}. A subgraph H of G is said to be spanning if it 
contains all nodes of G, i.e., V(H) = V. 

A path P in an undirected graph G is a sequence P = (vi, . . . , v*) of nodes such that 
e,- = (vj,Vi + i) (1 < i < fc) is an edge of G. We say that P is a path from v\ to v^, or a 
v\,Vk-path. P is simple if all v,- (1 < i < fc) are distinct. Note that if there is a vi,vj.-path 
in G, then there is a simple one. Unless stated otherwise, the length of P refers to the 
number of edges of P. A path C = (vi, . . . , = Vi) that starts and ends in the same node 
is called a cyc/e. C is simple if all nodes vi, . . . , v^-i are distinct. A graph is said to be 
acyclic if it does not contain a cycle. 

A connected component C C V of an undirected graph G is a maximal subset of nodes 
such that for every two nodes u,v € C there is a M,v-path in G. A graph G is said to be 
connected if for every two nodes u, v e V there is a m, v-path in G. A connected subgraph 
T of G that does not contain a cycle is called a free of G. A spanning tree r of G is a tree 
of G that contains all nodes of G. A subgraph F of G is a forest if it consists of a (disjoint) 
union of trees. 

A directed graph G = is defined analogously with the only difference that edges 

are directed. That is, every edge e is associated with an ordered pair («, v) € V x V . Here 
m is called the source (or fa;7) of e and v is called the target (or head) of e. Note that, 
as opposed to the undirected case, edge (u,v) is different from edge (v,u) in the directed 
case. All concepts introduced above extend in the obvious way to directed graphs. 

1.5 Sets, etc. 

Let S be a set and e ^ S. We will write S + e as a short for S U {e}. Similarly, for e e 5 we 
write S — e as a short for S \ {e}. 

The symmetric difference of two sets 5 and T is defined as5Ar = (5\r)U(r\5). 

We use N, Z, Q and M to refer to the set of natural, integer, rational and real numbers, 
respectively. We use Q + and K + to refer to the nonnegative rational and real numbers, 
respectively. 
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1.6 Basics of Linear Programming Theory 



Many optimization problems can be formulated as an integer linear program (ILP). Let 
II be a minimization problem. Then II can often be formulated as follows: 



minimize ^ cjxj 



subject to a ij x j — bi Vz € {1 , . . . , m} 
xj £ {0,1} Vj €{!,...,«} 



(1) 



Here, Xj is a decision variable that is either set to or 1 . The above ILP is therefore also 
called a 0/1-ILP. The coefficients aij, b[ and c; are given rational numbers. 

If we relax the integrality constraint on Xj, we obtain the following LP-relaxation of the 
above ILP (1): 



minimize 



L c i x i 



subject to a ij x j ^ ^< V/ G { 1 , . . . , m} 
xj > V; e {!,...,«} 



(2) 



In general, we would have to enforce that Xj < 1 for every j 6 {1 ,...,«} additionally. 
However, these constraints are often redundant because of the minimization objective 
and this is what we assume subsequently. Let OPT and OPTlp refer to the objective 
function values of an optimal integer and fractional solution to the ILP (1) and LP (2), 
respectively. Because every integer solution to ( 1 ) is also a feasible solution for (2), we 
have OPTlp < OPT. That is, the optimal fractional solution provides a lower bound on 
the optimal integer solution. Recall that establishing a lower bound on the optimal cost 
is often the key to deriving good approximation algorithms for the optimization problem. 
The techniques that we will discuss subsequently exploit this observation in various ways. 

Let (xj) be an arbitrary feasible solution. Note that {xj) has to satisfy each of the m 
constraints of (2). Suppose we multiply each constraint i € {1 , . . . , m} with a non-negative 
value y, and add up all these constraints. Then 

m / n \ m 

i=l V/=i ' i=l 
Suppose further that the multipliers y, are chosen such that Y4L1 a <jyi < c j- Then 

n n / m \ m / n \ m 

L c i x i ^ L L a ijy* ) x j = E ( L fl y^ ) y> ; ^ L ^ 

j=\ j=l\i=\ J i=\\j=\ / (=1 

That is, every such choice of multipliers establishes a lower bound on the objective func- 
tion value of (xj). Because this holds for an arbitrary feasible solution {xj) it also holds 
for the optimal solution. The problem of finding the best such multipliers (providing the 
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largest lower bound on OPTlp) corresponds to the so-called dual program of (2). 

in 

maximize ^ biyi 

i=\ 

(4) 

subject to 2lfl,;3 , i < Cj V; G {1, . . . ,«} 

i=\ 

yi > ViG{l,. ..,m} 

We use OPTdp to refer to the objective function value of an optimal solution to the dual 
linear program (4). 

There is a strong relation between the primal LP (2) and its corresponding dual LP (4). 
Note that (3) shows that the objective function value of an arbitrary feasible dual solution 
(yi) is less than or equal to the objective function value of an arbitrary feasible primal 
solution (xj). In particular, this relation also holds for the optimal solutions and thus 
OPTdp < OPTlp- This is sometimes called weak duality. From linear programming 
theory, we know that even a stronger relation holds: 

Theorem 1.1 (strong duality). Let x — (xj) and y — (yi) be feasible solutions to the LPs 
(2) and (4), respectively. Then x andy are optimal solutions if and only if 

n m 



An alternative characterization is given by the complementary slackness conditions: 

Theorem 1.2. Let x — (xj) and y — (yi) be feasible solutions to the LPs (2) and (4), 
respectively. Then x and y are optimal solutions if and only if the following conditions 
hold: 

1. Primal complementary slackness conditions: for every j£ {l,...,n}, eitherxj =0 
or the corresponding dual constraint is tight, i.e., 

m 

Vj E {l,...,n} : xj>0 Y, a ijyi= c J- 

2. Dual complementary slackness conditions: for every i € { 1 , . . . , m}, either yi = 
or the corresponding primal constraint is tight, i.e., 

n 

V/e {l,...,m} : yi>0 => Yj a ii x i =b i- 

7=1 
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2. Minimum Spanning Trees 



2.1 Introduction 

We consider the minimum spanning tree problem (MST), which is one of the simplest and 
most fundamental problems in network optimization: 

Minimum Spanning Tree Problem (MST): 

Given: An undirected graph G = (V,E) and edge costs c : E — >• M. 
Goal: Find a spanning tree T of G of minimum total cost. 

Recall that T is a spanning tree of G if T is a spanning subgraph of G that is a tree. The 
cost c(T) of a tree T is defined as c(T) = Y,eeT c i e )- Note that we can assume without 
loss of generality that G is connected because otherwise no spanning tree exists. 

If all edges have non-negative costs, then the MST problem is equivalent to the connected 
subgraph problem which asks for the computation of a minimum cost subgraph H of G 
that connects all nodes of G. 

2.2 Coloring Procedure 

Most known algorithms for the MST problem belong to the class of greedy algorithms. 
From a high-level point of view, such algorithms iteratively extend a partial solution to 
the problem by always adding an element that causes the minimum cost increase in the 
objective function. While in general greedy choices may lead to suboptimal solutions, 
such choices lead to an optimal solution for the MST problem. 

We will get to know different greedy algorithms for the MST problem. All these algo- 
rithms can be described by means of an edge-coloring process: Initially, all edges are 
uncolored. In each step, we then choose an uncolored edge and color it either red (mean- 
ing that the edge is rejected) or blue (meaning that the edge is accepted). The process ends 
if there are no uncolored edges. Throughout the process, we make sure that we maintain 
the following color invariant: 

Invariant 2.1 (Color invariant). There is a minimum spanning tree containing all the blue 
edges and none of the red edges. 

The coloring process can be seen as maintaining a forest of blue trees. Initially, the 
forest consists of n isolated blue trees corresponding to the nodes in V. The edges are 
then iteratively colored red or blue. If an edge is colored blue, then the two blue trees 
containing the endpoints of this edge are combined into one new blue tree. If an edge is 
colored red, then this edge is excluded from the blue forest. The color invariant ensures 
that the forest of blue trees can always be extended to a minimum spanning tree (by using 
some of the uncolored edges and none of the red edges). Note that the color invariant 
ensures that the final set of blue edges constitutes a minimum spanning tree. 
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We next introduce two coloring rules on which our algorithms are based. We first need 
to introduce the notion of a cut. Let G = (V,E) be an undirected graph. A cut of G is a 
partition of the node set V into two sets: X and X = V \X. An edge e = (w,v) is said to 
cross a cut (X,X) if its endpoints lie in different parts of the cut, i.e., u £ X and v EX. Let 
8(X) refer to the set of all edges that cross (X,X), i.e., 

8{x) = {(u,v)eE | U £X, vev\x}. 

Note that d(-) is symmetric, i.e., 8(X) = 8(X). 
We can now formulate the two coloring rules: 

Blue rule: Select a cut (X,X) that is not crossed by any blue edge. Among the uncolored 
edges in 8(X), choose one of minimum cost and color it blue. 

Red rule: Select a simple cycle C that does not contain any red edge. Among the uncol- 
ored edges in C, choose one of maximum cost and color it red. 

Our greedy algorithm is free to apply any of the two coloring rules in an arbitrary order 
until all edges are colored either red or blue. The next theorem proves correctness of the 
algorithm. 

Theorem 2.1. The greedy algorithm maintains the color invariant in each step and even- 
tually colors all edges. 

Proof. We show by induction on the number t of steps that the algorithm maintains the 
color invariant. Initially, no edges are colored and thus the color invariant holds true 
for t = (recall that we assume that G is connected and thus a minimum spanning tree 
exists). Suppose the color invariant holds true after t — 1 steps (r > 1). Let T be a minimum 
spanning tree satisfying the color invariant (after step t — 1). 

Assume that in step t we color an edge e using the blue rule. If e € T, then T satisfies 
the color invariant after step t and we are done. Otherwise, e £ T. Consider the cut 
(X,X) to which the blue rule is applied to color e = (u,v) (see Figure 1). Because T 
is a spanning tree, there is a path P uv in T that connects the endpoints u and v of e. 
At least one edge, say e', of P uv must cross (X,X). Note that e' cannot be red because 
T satisfies the color invariant. Also e' cannot be blue because of the pre-conditions of 
applying the blue rule. Thus, e' is uncolored and by the choice of e, c(e) < c(e'). By 
removing e' from T and adding e, we obtain a new spanning tree T' = (T — <?') + e of cost 
c(T') — c(T) — c(e') +c(e) < c(T). Thus, T' is a minimum spanning tree that satisfies 
the color invariant after step t. 

Assume that in step t we color an edge e using the red rule. If e ^ T, the T satisfies the 
color invariant after step t and we are done. Otherwise, e G T. Consider the cycle C to 
which the red rule is applied to color e = (u,v) (see Figure 2). By removing e from T, we 
obtain two trees whose node sets induce a cut (X,X). Note that e crosses (X,X). Because 
C is a cycle, there must exist at least one other edge, say e', in C that crosses (X,X). Note 
that e' cannot be blue because e' £ T and the color invariant. Moreover, e' cannot be red 
because of the pre-conditions of applying the red rule. Thus, e 1 is uncolored and by the 
choice of e, c(e) > c(e'). By removing e from T and adding e', we obtain a new spanning 
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Figure 1: Illustration of the exchange argument used in the proof of Theorem 2. 1 (blue 
rule). 

tree T' = (T — e) +<?' of cost c(T') — c(T) — c(e) +c(e') < c(T), Thus, T' is a minimum 
spanning tree that satisfies the color invariant after step t. 

Finally, we show that eventually all edges are colored. Suppose the algorithm stops be- 
cause neither the blue rule nor the red rule applies but there is still some uncolored edge 
e = (k,v). By the color invariant, the blue edges constitute a forest of blue trees. If both 
endpoints u and v of e are part of the same blue tree T, then we can apply the red rule 
to the cycle induced by the unique path P uv from u to v in T and e to color e red. If the 
endpoints u and v are contained in two different blue trees, say T u and T v , then the node 
set of one of these trees, say X — V(T U ), induces a cut (X,X) to which the blue rule can be 
applied to color an uncolored edge (which must exist because of the presence of e). Thus 
an uncolored edge guarantees that either the red rule or the blue rule can be applied. □ 
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Figure 2: Illustration of the exchange argument used in the proof of Theorem 2.1 (red 
rule). 

2.3 Kruskal's Algorithm 

Kruskal's algorithm sorts the edges by non-decreasing cost and then considers the edges 
in this order. If the current edge e, = (u, v) has both its endpoints in the same blue tree, it 
is colored red; otherwise, it is colored blue. The algorithm is summarized in Algorithm 1 . 

It is easy to verify that in each case the pre-conditions of the respective rule are met: If 
the red rule applies, then the unique path P uv in the blue tree containing both endpoints of 
e, together with e, forms a cycle C. The edges in CDP UV are blue and e,- is uncolored. We 
can thus apply the red rule to e,-. Otherwise, if the blue rule applies, then e, connects two 
blue trees, say T u and T v , in the current blue forest. Consider the cut (X,X) induced by the 
node set of T u , i.e., X = V (T u ). No blue edge crosses this cut. Moreover, e,- is an uncolored 
edge that crosses this cut. Also observe that every other uncolored edge e £ 8(X) has cost 
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Input: undirected graph G = {V,E) with edge costs c : E — > K. 
Output: minimum spanning tree T 

1 Initialize: all edges are uncolored 

(Remark: we implicitly maintain a forest of blue trees below) 

2 Let (ei, . . . ,e m ) be the list of edges of G, sorted by non-decreasing cost 

3 for i <— 1 to m do 

4 if e/ /ifli £>o?/i endpoints in the same blue tree then color e,- red else color e; 
blue 

5 end 

6 Output the resulting tree T of blue edges 

Algorithm 1: Kruskal's MST algorithm. 

c(e) > c(e,) because we color the edges by non-decreasing cost. We can therefore apply 
the blue rule to e,. An immediate consequence of Theorem 2. 1 is that Kruskal's algorithm 
computes a minimum spanning tree. 

We next analyze the time complexity of the algorithm: The algorithm needs to sort the 
edges of G by non-decreasing cost. There are different algorithms to do this with different 
running times. The most efficient algorithms sort a list of k elements in O(klogk) time. 
There is also a lower bound that shows that one cannot do better than that. That is, in our 
context we spend ©(mlogm) time to sort the edges by non-decreasing cost. 

We also need to maintain a data structure in order to determine whether an edge e, has both 
its endpoints in the same blue tree or not. A trivial implementation stores for each node 
a unique identifier of the tree it is contained in. Checking whether the endpoints u and v 
of edge e, = (u,v) are part of the same blue tree can then be done in 0(1) time. Merging 
two blue trees needs time 0(n) in the worst case. Thus, the trivial implementation takes 
0(m + n 2 ) time in total (excluding the time for sorting). 

One can do much better by using a so-called union-find data structure. This data struc- 
ture keeps track of the partition of the nodes into blue trees and allows only two types of 
operations: union and find. The find operation identifies the node set of the partition to 
which a given node belongs. It can be used to check whether the endpoints u and v of edge 
e, = (u.y) belong to the same tree or not. The union operation unites two node sets of the 
current partition into one. This operation is needed to update the partition whenever we 
color e, = (u,v) blue and have to join the respective blue trees T u and T v . Sophisticated 
union-find data structures support a series of n union and mfind operations on a universe 
of n elements in time 0(n + ma(n, ^)), where a(n,d) is the inverse Ackerman function 
(see [8, Chapter 2] and the references therein). a(n,d) is increasing in n but grows ex- 
tremely slowly for every fixed d, e.g., a(2 65536 ,0) = 4; for most practical situations, it 
can be regarded as a constant. 

The overall time complexity of Kruskal's algorithm is thus 0(mlogm + n + ma(n, ^)) = 
0(m\ogm) = 0(m\ogn) (think about it!). 

Corollary 2.1. Kruskal's algorithm solves the MST problem in time 0(m\ogn). 
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2.4 Prim's Algorithm 

Prim's algorithm grows a single blue tree, starting at an arbitrary node s 6 V. In every 
step, it chooses among all edges that are incident to the current blue tree T containing s an 
uncolored edge e, of minimum cost and colors it blue. The algorithm stops if T contains 
all nodes. We implicitly assume that all edges that are not part of the final tree are colored 
red in a post-processing step. The algorithm is summarized in Algorithm 2. 



Input: undirected graph G = {V,E) with edge costs c : E — > M. 
Output: minimum spanning tree T 

1 Initialize: all edges are uncolored 

{Remark: we implicitly maintain a forest of blue trees below) 

2 Choose an arbitrary node s 

3 for i <— 1 to n — 1 do 

4 Let T be the current blue tree containing s 

s Select a minimum cost edge <?,- e 8(V(T)) incident to T and color it blue 

6 end 

7 Implicitly, color all remaining edges red 

8 Output the resulting tree T of blue edges 



Note that the pre-conditions are met whenever the algorithm applies one of the two col- 
oring rules: If the blue rule applies, then the node set V(T) of the current blue tree T 
containing s induces a cut (X,X) with X = V(T). No blue edge crosses (X,X) by con- 
struction. Moreover, e; is among all uncolored edges crossing the cut one of minimum 
cost and can thus be colored blue. If the red rule applies to edge e — («, v), both endpoints 
u and v are contained in the final tree T. The path P uv in T together with e induce a cycle 
C. All edges in C n P uv are blue and we can thus color e red. 

The time complexity of the algorithm depends on how efficiently we are able to identify 
a minimum cost edge <?,■ that is incident to T, To this aim, good implementations use a 
priority queue data structure. The idea is to keep track of the minimum cost connections 
between nodes that are outside of T to nodes in T, Suppose we maintain two data entries 
for every node v ^ V(T): 7C(y) = (n,v) refers to the edge that minimizes c(u,v) among 
all u € V(T) and d(v) = c(n(v)) refers to the cost of this edge; we define n(v) = nil and 
d{y) = oa if no such edge exists. Initially, we have for every node v € V \ {s}: 



The algorithm now repeatedly chooses a node v ^ V(T) with d(v) minimum, adds it to 
the tree and colors its connecting edge n(v) blue. Because v is part of the new tree, 
we need to update the above data. This can be accomplished by iterating over all edges 
(v,w) € E incident to v and verifying for every adjacent node w with w ^ V(T) whether 
the connection cost from w to T via edge (v,w) is less than the one stored in d{w) (via 
7C(w)). If so, we update the respective data entries accordingly. Note that if the value of 



Algorithm 2: Prim's MST algorithm. 




and 
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d(w) changes, then it can only decrease. 



There are several priority queue data structures that support all operations needed above: 
insert, find-min, delete-min and decrease-priority. In particular, using Fibonacci heaps, 
m decrease-priority and n insert! find-mini delete-min operations can be performed in time 
0(m + nlogn). 

Corollary 2.2. Prim's algorithm solves the MST problem in time 0(m + nlogn). 

References 

The presentation of the material in this section is based on [8, Chapter 6], 
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3. Matroids 



3.1 Introduction 

In the previous section, we have seen that the greedy algorithm can be used to solve the 
MST problem. An immediate question that comes to ones mind is which other problems 
can be solved by such an algorithm. In this section, we will see that the greedy algorithm 
applies to a much broader class of optimization problems. 

We first define the notion of an independent set system. 

Definition 3.1. Let S be a finite set and let I be a collection of subsets of S. (S,l) is an 
independent set system if 

(Ml) (del; 

(M2) if I el and/ C I, then J el. 

Each set I el is called an independent set; every other subset ICS with I ^ 1 is called a 
dependent set. Further, suppose we are given a weight function w : S — > K on the elements 
in S. 

Maximum Weight Independent Set Problem (MWIS) : 

Given: An independent set system (S,l) and a weight function w : S — > M. 
Goal: Find an independent set I el of maximum weight w(I) = Yaei w i x )- 

If w(x) < for some x e S, then x will not be included in any optimum solution because 
1 is closed under taking subsets. We can thus safely exclude such elements from the 
ground set S. Subsequently, we assume without loss of generality that all weights are 
nonnegative. 

As an example, consider the following independent set system: Suppose we are given 
an undirected graph G = (V,E) with weight function w : E — > R + . Define S = E and 
1 = {F C E | F induces a forest in G}. Note that (del and 1 is closed under taking 
subsets because each subset J of a forest I e 1 is a forest. Now, the problem of finding an 
independent set I el that maximizes w(7) is equivalent to finding a spanning tree of G 
of maximum weight. (Note that the latter can also be done by one of the MST algorithms 
that we have considered in the previous section.) 

The greedy algorithm given in Algorithm 3 is a natural generalization of Kruskal's algo- 
rithm to independent set systems. It starts with the empty set I = and then iteratively 
extends I by always adding an element x e S \I of maximum weight, ensuring that I + x 
remains an independent set. 

Unfortunately, the greedy algorithm does not work for general independent set systems 
as the following example shows: 
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Input: independent set system (S,I) with weight function w : S — » K 
Output: independent set / 6 I of maximum weight 

1 Initialize: / = 0. 

2 while f/iere i's some x £ S\I with I + x £ I do 

3 Choose such an jc with w(x) maximum 

4 /•*-/ + * 

5 end 

6 return/ 



Algorithm 3: Greedy algorithm for matroids. 



Example 3.1. 

Suppose that we are given an undirected graph G — (V,E) with p j s 

weight function w : E -> R. Let S — E and define X = {M C 9 <? 

£ | M is a matching of G}. (Recall that a subset M C £ of the ^ ^ 

edges of G is called a matching if no two edges of M share a com- 
mon endpoint.) It is not hard to see that el and I is closed r 
under taking subsets. Thus Conditions (Ml) and (M2) are sat- 
isfied and (S,I) is an independent set system. Note that finding 

an independent set / £ I of maximum weight w(I) is equivalent to finding a maximum 
weight matching in G. Suppose we run the above greedy algorithm on the independent 
set system induced by the matching instance depicted on the right. The algorithm re- 
turns the matching { (p , q) , (r, s) } of weight 12, which is not a maximum weight matching 
(indicated in bold). 



3.2 Matroids 



Even though the greedy algorithm described in Algorithm 3 does not work for general 
independent set systems, it does work for independent set systems that are matroids. 

Definition 3.2 (Matroid). An independent set system M — (S,I) is a matroid if 
(M3) if/,7eland |/| < |/|, then/ +x Gl for some jc G /\7. 

Note that Condition (M3) essentially states that if / and J are two independent sets with 
|/| < |/|, then there must exist an element x £ J\I that can be added to / such that the 
resulting set / + x is still an independent set. 

Given a subset U C S, a subset B C U is called a basis of U if B is an inclusionwise 
maximal independent subset of U, i.e., B £l and there is no / £ I with B C / C U, It is 
not hard to show that Condition (M3) is equivalent to 

(M4) for every subset U C S, any two bases of U have the same size. 

The common size of the bases of U C S is called the rank of U and denoted by r(U). 
An independent set is simply called a basis if it is a basis of S. The common size of the 
bases of S is called the rank of the matroid M. Note that if all weights are nonnegative, 
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the MWIS problem is equivalent to finding a maximum weight basis of M. 
We give some examples of matroids. 

Example 3.2 (Uniform matroid). One of the simplest examples of a matroid is the so- 
called uniform matroid. Suppose we are given some set S and an integer k. Define the 
independent sets I as the set of all subsets of S of size at most k, i.e., I = {I C S \ \I\<k}. 
It is easy to verify that M — (S,X) is a matroid. M is also called the k-uniform matroid. 

Example 3.3 (Partition matroid). Another simple example of a matroid is the partition 
matroid. Suppose S is partitioned into m sets S\,...,S m and we are given m integers 
k u ...,k m . Define I = {/ C X | |/nS,| < for all 1 < i < m}. Conditions (Ml) and 
(M2) are trivially satisfied. To see that Condition (M3) is satisfied as well, note that if 
/,/ G 1 and |/| < \J\, then there is some i (1 < i < m) such that | y n 5/ 1 > |/nS,-| and thus 
adding any element x £ S : ■ n (/\/) to / maintains independence. Thus, M = (S,l) is a 
matroid. 

Example 3.4 (Graphic matroid). Suppose we are given an undirected graph G = (V,E). 
Let S = E and define 1 = {F C E \ F induces a forest in G}. We already argued above 
that Conditions (Ml) and (M2) are satisfied. We next show that Conditions (M4) is satis- 
fied too. Let U C E. Consider the subgraph (V, U) of G induced by U and suppose that it 
consists of k components. By definition, each basis B of U is an inclusionwise maximal 
forest contained in U. Thus, B consists of k spanning trees, one for each component of 
the subgraph (V, U). We conclude that B contains |V| —k elements. Because this holds 
for every basis of U, Condition (M4) is satisfied. We remark that any matroid M — (S,I) 
obtained in this way is also called a graphic matroid (or cycle matroid). 

Example 3.5 (Matching matroid). The independent set system of Example 3.1 is not a 
matroid. However, there is another way of defining a matroid based on matchings. Let 
G = (V,E) be an undirected graph. Given a matching M C E of G, let V(M) refer to 
the set of nodes that are incident to the edges of M. A node set / C V is covered by 
M if / C V(M). Define S = V and I = {/cy | I is covered by some matching M}. 
Condition (Ml) holds trivially. Condition (M2) is satisfied because if a node set / £ 1 is 
covered by a matching M, then each subset J C / is also covered by M and thus J El. It 
can also be shown that Condition (M3) is satisfied and thus M — (S,l) is a matroid. M is 
also called a matching matroid. 

3.3 Greedy Algorithm for Matroids 

The next theorem shows that the greedy algorithm given in Algorithm 3 always computes 
a maximum weight independent set if the underlying independent set system is a ma- 
troid. The theorem actually shows something much stronger: Matroids are precisely the 
independent set systems for which the greedy algorithm computes an optimal solution. 

Theorem 3.1. Let {S,X) be an independent set system. Further, let w : S — » M+ be a 
nonnegative weight function on S. The greedy algorithm (Algorithm 3) computes an inde- 
pendent set of maximum weight if and only ifM = (S,T) is a matroid. 



16 



Proof. We first show that the greedy algorithm computes a maximum weight independent 
set if M is a matroid. Let X be the independent set returned by the greedy algorithm and 
let Y be a maximum weight independent set. Note that both X and Y are bases of M. 
Order the elements in X — {x\ ,x m } such that x, (1 <i< m) is the z'-th element chosen 
by the algorithm. Clearly, w(xi) > ■■ ■ > w(x m ). Also order Y = {yi, . . . ,y m } such that 
w(yi) >■■■> w(y m ). We will show that w(xi) > w(yf) for every i. Let k + 1 be the smallest 
integer such that w(xi c+ i) < w(yk+\). (The claim follows if no such choice exists.) Define 
/ = {x\,. ■ ■ ,Xk} and J = {yi,. . . ,yi<+i}- Because I, J E 1 and |/| < Condition (M3) 
implies that there is some y,- £ J\I such that 7 + y,- € I. Note that w(y,) > w(yk+\) > 
w(xi c+ i). That is, in iteration k + 1, the greedy algorithm would prefer to addy, instead of 
Xk + \ to extend /, which is a contradiction. We conclude that w(X) > w(Y) and thus X is a 
maximum weight independent set. 

Next assume that the greedy algorithm always computes an independent set of maximum 
weight for every independent set system (S,X) and weight function w : S — >• R+. We 
show that M = (S,l) is a matroid. Conditions (Ml) and (M2) is satisfied by assumption. 
It remains to show that Condition (M3) holds. Let I, J 6 1 with |/| < |/| and assume, for 
the sake of a contradiction, that I + x ^1 for every x £ J\I. Let k = \I\ and consider the 
following weight function on S: 



Now, in the first k iterations, the greedy algorithms picks the elements in /. By assumption, 
the algorithm cannot add any other element from J \ I and thus outputs a solution of weight 
k(k + 2). However, the independent set J has weight at least |7| (k+ 1) > (k+ l)(k+ 1) > 
k(k + 2). That is, the greedy algorithm does not compute a maximum weight independent 



References 

The presentation of the material in this section is based on [2, Chapter 8] and [7, Chapters 
39 &40]. 




k + 2 ifxel 
k+l ifxeJ\I 



otherwise. 



set, which is a contradiction. 



□ 
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4. Shortest Paths 



4.1 Introduction 

We next consider shortest path problems. These problems are usually defined for directed 
networks. Let G = (V,E) be a directed graph with cost function c : E — > R. Consider a 
(directed) path P = (vi , . . . , v*) from s = Vi to r = v*. The length of path P is defined as 
c(P) = J^Tj c(v;, v,-+i ). We can then ask for the computation of an s, f-path whose length 
is shortest among all directed paths from s to t . There are different variants of shortest 
path problems: 

1. Single source single target shortest path problem: Given two nodes s and t, deter- 
mine a shortest path from s to t . 

2. Single source shortest path problem: Given a node s, determine all shortest paths 
from s to every other node in V. 

3. All-pairs shortest path problem: For every pair (s,t) £ V x V of nodes, compute a 
shortest path from s to t . 

The first problem is a special case of the second one. However, every known algorithm for 
the first problem implicitly also solves the second one (at least partially). We therefore 
focus here on the single source shortest path problem and the all-pairs shortest path 
problem. 

4.2 Single Source Shortest Path Problem 

We consider the following problem: 

Single Source Shortest Path Problem (SSSP): 

Given: A directed graph G = (V,E) with cost function c : E — > M and a source 
node s e V. 

Goal: Compute a shortest path from s to every other node v e V . 

Note that a shortest path from s to a node v might not necessarily exist because of the 
following two reasons: First, v might not be reachable from s because there is no directed 
path from s to v in G. Second, there might be arbitrarily short paths from s to v because 
of the existence of an s,v-path that contains a cycle of negative length (which can be 
traversed arbitrarily often). We call a cycle of negative total length also a negative cycle. 
The following lemma shows that these are the only two cases in which no shortest path 
exists. 

Lemma 4.1. Let v be a node that is reachable from s. Further assume that there is no 
path from s to v that contains a negative cycle. Then there exists a shortest path from s to 
v which is a simple path. 

Proof. Let P be a path from s to v. We can repeatedly remove cycles from P until we 
obtain a simple path P 1 . By assumption, all these cycles have non-negative lengths and 
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thus c(P') < c(P). It therefore suffices to show that there is a shortest path among all 
simple i,v-paths. But this is obvious because there are only finitely many simple paths 
from s to v in G. □ 



4.2.1 Basic properties of shortest paths 

We define a distance function 8 : V — > K as follows: For every v EV, 

5(v) — inf{c(P) | P is a path from s to v} . 
With the above lemma, we have 

5 (v) = °° if there is no path from s to v 

5 (v) = — °° if there is a path from s to v that contains a negative cycle 
5 (v) € R if there is a shortest (simple) path from s to v. 

The next lemma establishes that 8 satisfies the triangle inequality. 
Lemma 4.2. For every edge e — (w,v) € E, we have 5(v) < S(u) +c(u,v). 

Proof. Clearly, the relation holds if 8(u) = °°. Suppose 8(u) = — °°. Then there is a path 
P from s to u that contains a negative cycle. By appending edge e to P, we obtain a path 
from s to v that contains a negative cycle and thus 8(v) = — °°. The relation again holds. 
Finally, assume 5(h) € R. Then there is a path P from 5 to m of length 8(u). By appending 
edge e to P, we obtain a path from s to v of length 5(h) +c(h,v). A shortest path from s 
to v can only have shorter length and thus 5(v) < S(u)+c(u,v). □ 

The following lemma shows that subpaths of shortest paths are shortest paths. 

Lemma 4.3. Let P = (vi, . . . , Vj) be a shortest path from V\ to v^. Then every subpath 
P' = (v,-, . . . , Vj) ofP with 1 < i < j < k is a shortest path from V; to vj. 

Proof. Suppose there is a path P" = (v/,i*i, . . . , K/, Vj) from v,- to vy that is shorter than P'. 
Then the path (vi, . . . , V;, u\, . . . Vj, . . . , v^) is a vi , v^-path that is shorter than P, which 
is a contradiction. □ 

Consider a shortest path P = (s = Vi , . . . , = v) from s to v. The above lemma enables 
us to show that every edge e = (v,-,v, + i) of P must be tight with respect to the distance 
function 5, i.e., 5(v) = 8(u)+c(u,v). 

Lemma 4.4. Let P — (s, ... ,w, v) be a shortest s,v-path. Then 5(v) = 8(u) +c(u,v). 

Proof. By Lemma 4.3, the subpath P' = (s,...,u) of P is a shortest s,M-path and thus 
8{u) = c(P'). Because P is a shortest s, v-path, we have 5(v) = c(P) = c(P') +c(u,v) — 
8(u) + c(m,v). □ 



19 



Suppose now that we can compute 5(v) for every node v € V. Using the above lemmas, it 
is not difficult to show that we can then also efficiently determine the shortest paths from 
s to every node v € V with 5(v) € K: Let V' = {v € V | 5(v) € M} be the set of nodes for 
which there exists a shortest path from s. Note that 8(s) = and thus s E V'. Further, let 
E' be the set of edges that are tight with respect to 8, i.e., 

E' = {(u,v)eE | 8(v) = 8(u) + c(u,v)}. 

Let G' = (V' ,E') be the subgraph of G induced by V' and E' . Observe that we can 
construct G' in time 0(n + m). By Lemma 4.4, every edge of a shortest path from s to 
some node v e V' is tight. Thus every node v EV' is reachable from s in G' . Consider a 
path P = (s = Vi, . . . , Vfc = v) from s to v in G'. Then 

c(P) = X>(v,-,v, +1 ) = £(S(v m )-S(v ; )) = S(v)-S(*) = 5(v). 

i=l ;=1 

That is, P is a shortest path from s to v in G. G' therefore represents all shortest paths from 
s to nodes v e V'. We can now extract a spanning tree T from G' that is rooted at s, e.g., 
by performing a depth-first search from s. Such a tree can be computed in time 0(n + m). 
Observe that T contains for every node v € V' a unique s, v-path which is a shortest path 
in G. T is therefore also called a shortest-path tree. Note that T is a very compact way to 
store for every node v € V' a shortest path from s to v. This tree needs 0(n) space only, 
while listing all these paths explicitly may need 0(n 2 ) space. 

In light of the above observations, we will subsequently concentrate on the problem of 
computing the distance function 8 efficiently. To this aim, we introduce a function d : 
V —> K of tentative distances. The algorithm will use d to compute a more and more 
refined approximation of 8 until eventually d(y) — 8(v) for every v e V. We initialize 
d(s) = and d{v) = °° for every v g V \ {s}. The only operation that is used to modify d 
is to relax an edge e = (u,v) EE: 

Relax(m,v): 

if d(v) > d{u) +c(u,v) then d{v) = d{u) +c(u,v) 

It is obvious that the of-values can only decrease by edge relaxations. 

We show that if we only relax edges then the tentative distances will never be less than 
the actual distances. 

Lemma 4.5. For every v E V, d(v) > 8(v). 

Proof. The proof is by induction on the number of relaxations. The claim holds after 
the initialization because d(v) = °° > 8(v) and d(s) = = 8(s). For the induction step, 
suppose that the claim holds true before the relaxation of an edge e — (u,v). We show 
that it remains valid after edge e has been relaxed. By relaxing (u,v), only d(v) can be 
modified. If d(v) is modified, then after the relaxation we have d{v) — d(u) +c(u,v) > 
8(u) +c(u,v) > 8(v), where the first inequality follows from the induction hypothesis 
and the latter inequality holds because of the triangle inequality (Lemma 4.2). □ 
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That is, d(v) decreases throughout the execution but will never be lower than the actual 
distance 5(v). In particular, d(v) — 8(v) = °° for all nodes v € V that are not reachable 
from s. Our goal will be to use only few edge relaxations to ensure that d{v) = 8(v) for 
every v <E V with <5(v) e K. 

Lemma 4.6. Let P = (s,... ,u,v) be a shortest s, v-path. Ifd(u) — 8 (u) before the relax- 
ation of edge e — (u,v), then d(y) — 8(v) after the relaxation of edge e. 

Proof. Note that after the relaxation of edge e, we have d(v) — d(u) +c(u,v) = 8(u) + 
c(w, v) = 5(v), where the last equality follows from Lemma 4.4. □ 

4.2.2 Arbitrary cost functions 

The above lemma makes it clear what our goal should be. Namely, ideally we should 
relax the edges of G in the order in which they appear on shortest paths. The dilemma, of 
course, is that we do not know these shortest paths. The following algorithm, also known 
as the Bellman-Ford algorithm, circumvents this problem by simply relaxing every edge 
exactly n — 1 times, thereby also relaxing all edges along shortest path in the right order. 
An illustration is given in Figure 4. 



Input: directed graph G= (V,E), cost function c : E — >• R, source node s e V 
Output: shortest path distances d : V — > R 

1 Initialize: d(s) — and d(v) — °° for every v G V \ {s} 

2 for i <— 1 to n — 1 do 

3 | foreach (w,v) € E do Relax(w,v) 

4 end 

5 return d 



Algorithm 4: Bellman-Ford algorithm for the SSSP problem. 
Lemma 4.7. After the Bellman-Ford algorithm terminates, d(v) = 8 {v)for allv G V with 

8(v) > — oo. 

Proof. As argued above, after the initialization we have d(v) — 8(y) for all v g V with 
8 (v) = °°. Consider a node v € V with 8 (v) £ R. Let /" = (s = vi, . . . , = v) be a shortest 
s, v-path. Define a phase of the algorithm as the execution of the inner loop. That is, the 
algorithm consists of n — 1 phases and in each phase every edge of G is relaxed exactly 
once. Note that d(s) = 8(s) after the initialization. Using induction on i and Lemma 4.6, 
we can show that d(v,-+i) = 5(v;+i) at the end of phase i. Thus, after at most n — 1 phases 
d(v) = 8{v) for every v eV with 5(v) el. □ 

Note that the algorithm does not identify nodes v e V with 8 (v) = — oo. However, this 
can be accomplished in a post-processing step (see exercises). The time complexity of 
the algorithm is obviously 0(nm). Clearly, we might improve on this by stopping the 
algorithm as soon as all tentative distances remain unchanged in a phase. However, this 
does not improve on the worst case running time. 
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Figure 3: Illustration of the Bellman-Ford algorithm. The order in which the edges are 
relaxed in this example is as follows: We start with the upper right node and proceed in 
a clockwise order. For each node, edges are relaxed in clockwise order. Tight edges are 
indicated in bold. Only the first three phases are depicted (no change in the final phase). 

Theorem 4.1. The Bellman-Ford algorithm solves the SSSP problem without negative 
cycles in time &(nm). 

4.2.3 Nonnegative cost functions 

The running time of 0(nm) of the Bellman-Ford algorithm is rather large. We can signif- 
icantly improve upon this in certain special cases. The easiest such special case is if the 
graph G is acyclic. 

Another example is if the edge costs are nonnegative. Subsequently, we assume that the 
cost function c : E — > R + is nonnegative. 

The best algorithm for the SSSP with nonnegative cost functions is known as Dijkstra's 
algorithm. As before, the algorithm starts with d(s) = and d(v) = °° for every v 6 
V \ {s}. It also maintains a set V* of nodes whose distances are tentative. Initially, V* = V. 
The algorithm repeatedly chooses a node u £ V* with d(u) minimum, removes it from 
V* and relaxes all outgoing edges (u,v). The algorithm stops when V* — 0. A formal 
description is given in Algorithm 5. 

Note that the algorithm relaxes every edge exactly once. Intuitively, the algorithm can 
be viewed as maintaining a "cloud" of nodes (V\V*) whose distance labels are exact. In 
each iteration, the algorithm chooses a node u € V * that is closest to the cloud, declares its 
distance label as exact and relaxes all its outgoing edges. As a consequence, other nodes 
outside of the cloud might get closer to the cloud. An illustration of the execution of the 
algorithm is given in Figure 4. 
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Input: directed graph G = (V,E), nonnegative cost function c : E — > R, source node 

seV 

Output: shortest path distances d : V — > R 

1 Initialize: d(s) = and d(v) — °° for every v € V \ {s} 

2 V* =V 

3 while V* ^ do 

4 Choose a node u € V* with c/(m) minimum. 

5 Remove m from V*. 

6 foreach (u,v) e £ do Relax(m,v) 

7 end 

8 return d 



Algorithm 5: Dijkstra's algorithm for the SSSP problem. 

The correctness of the algorithm follows from the following lemma. 
Lemma 4.8. Whenever a node u is removed from V*, we have d{u) — 8(u). 

Proof. The proof is by contradiction. Consider the first iteration in which a node u is 
removed from V* while d(u) > 8(u). Let A C V be the set of nodes v with d(v) = 8(v). 
Note that u is reachable from s because 8(u) < °°. Let P be a shortest s,M-path. If we 
traverse P from s to u, then there must be an edge (x,y) E P with x G A and y ^ A because 
i£A and m ^ A. Let (x,y) be the first such edge on P. We have d(x) — 8(x)< 8(u) < d(u), 
where the first inequality holds because all edge costs are nonnegative. Consequently, x 
was removed from V* before u. By the choice of u, d(x) — 8(x) when x was removed 
from V* . But then, by Lemma 4.6, we must have d(y) — 8(y) after the relaxation of edge 
(x,y), which is a contradiction to the assumption that y ^ A. □ 

The running time of Dijkstra's algorithm crucially relies on the underlying data structure. 
An efficient way to keep track of the tentative distance labels and the set V* is to use 
priority queues. We need at most n insert (initialization), n delete-min (removing nodes 
with minimum c/-value) and m decrease-priority operations (updating distance labels after 
edge relaxations). Fibonacci heaps support these operations in time 0(m + nlogn). 

Theorem 4.2. Dijkstra 's algorithm solves the SSSP problem with nonnegative edge costs 
in time 0(m + nlogn). 

4.3 All-pairs Shortest-path Problem 

We next consider the following problem: 
All-pairs Shortest Path Problem (APSP) : 

Given: A directed graph G = (V,E) with cost function c : E — >• R. 
Goal: Determine a shortest s,f-path for every pair (s,t) E V x V. 
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Figure 4: Illustration of Dijsktra's algorithm. The nodes in V \ V* are depicted in gray. 
The current node that is removed from V* is drawn in bold. The respective edge relax- 
ations are indicated in bold. 
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Input: directed graph G = (V,E), nonnegative cost function c : E — > K 


Output: shortest path distances d : V x V -> R 








if m = v 


1 Initialize: foreach (m,v) e V x V do d(u,v) = < 


c(w,v) if (u,v) e £ 






°° otherwise. 


2 for <— 1 to n do 




3 


foreach (m, v) e V x V do 




4 


| if J(m,v) > d(u,k) +d(k,v) then d(u,v) 


= d(u,k) +d(k,v) 


5 


end 




6 end 




7 return <f 





Algorithm 6: Floyd-Warshall algorithm for the APSP problem. 



We assume that G contains no negative cycle. 

Define a distance function 8 : V x V -> R as 

5(m,v) = inf{c(P) | P is a path from u to v}. 

Note that 8 is not necessarily symmetric. As for the SSSP problem, we can concentrate 
on the computation of the distance function 8 because the actual shortest paths can be 
extracted from these distances. 

Clearly, one way to solve the APSP problem is to simply solve n SSSP problems: For 
every node s e V, solve the SSSP problem with source node s to compute all distances 
8(s, •). Using the Bellman-Ford algorithm, the worst-case running time of this algorithm 
is 0(n 2 m), which for dense graphs is ©(n 4 ). We will see that we can do better. 

The idea is based on a general technique to derive exact algorithms known as dynamic 
programming. Basically, the idea is to decompose the problem into smaller sub-problems 
which can be solved individually and to use these solutions to construct a solution for the 
whole problem in a bottom-up manner. 

Suppose the nodes in V are identified with the set {l,...,n}. In order to define the 
dynamic program, we need some more notion. Consider a simple M,v-path P = (u = 
vi , . . . , vi = v) . We call the nodes V2 , . . . , v/_ i the interior nodes of P; P has no interior 
nodes if / < 2. A u, v-path P whose interior nodes are all contained in {1 , . . . , k} is called 
a (u,v,k)-path. Define 

5fc(w,v) = inf{c(P) | P is a (u, v,fc)-path} 

as the shortest path distance of a (u, v,A:)-path. Clearly, with this definition we have 
8(u,v) = 8 n (u,v). Our task is therefore to compute 8„(u,v) for every u, v e V. 

Our dynamic program is based on the following observation. Suppose we are able to 
compute 5^_i(m,v) for all u, v e V. Consider a shortest (u, v,&)-pathP = (u = vi, . . . ,v/ = 
v). Note that P is simple because we assume that G contains no negative cycles. By 
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definition, the interior nodes of P belong to the set {l,...,k}. There are two cases that 
can occur: Either node k is an interior node of P or not. 

First, assume that k is not an interior node of P. Then all interior nodes of P must belong 
to the set {l,...,k— 1}. That is, P is a shortest (u,v,k — l)-path and thus 5^(m,v) = 
5k-i(u,v). 

Next, suppose k is an interior node of P, i.e., P = (u = vi,. .. ,k, . . . ,v; = v). We can then 
break P into two paths Pi = (u,...,k) and P2 = (k, . . . , v) . Note that the interior nodes of 
Pi and P2 are contained in { 1 , . . . , k — 1} because P is simple. Moreover, because subpaths 
of shortest paths are shortest paths, we conclude that Pi is a shortest (u,k,k— l)-path and 
P2 is a shortest (k,v,k— l)-path. Therefore, 5^(m,v) = 8k-\(u,k) + 5£_i(£,v). 

The above observations lead to the following recursive definition of 5^(m,v): 

!0 if u = v 

c(u,v) if (u,v) e E 

°° otherwise. 

and 

Sk(u,v) =rnin{5^_i(M,v), + iffe>l 

The Floyd- Warshall algorithm simply computes 5fc(zt,v) in a bottom-up manner. The 
algorithm is given in Algorithm 6. 

Theorem 4.3. The Floyd-Warshall algorithm solves the APSP problem without negative 
cycles in time ©(« 3 ). 

References 

The presentation of the material in this section is based on [3, Chapters 25 and 26]. 
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5. Maximum Flows 



5.1 Introduction 

The maximum flow problem is a fundamental problem in combinatorial optimization with 
many applications in practice. We are given a network consisting of a directed graph 
G = (V,E) and nonnegative capacities c : E — > R + on the edges and a source node s e V 
and a target node t € V. 

Intuitively, think of G being a water network and suppose that we want to send as much 
water as possible (say per second) from a source node s (producer) to a target node t 
(consumer). An edge of G corresponds to a pipeline of the water network. Every pipeline 
comes with a capacity which specifies the maximum amount of water that can be sent 
through it (per second). Basically, the maximum flow problem asks how the water should 
be routed through the network such that the total amount of water that can be sent from s 
to t (per second) is maximized. 

We assume without loss of generality that G is complete. If G is not complete, then we 
simply add every missing edge (u,v)eVxV\£toG and define c(u,v) — 0. We also 
assume that every node u e V lies on a path from s to t (other nodes will be irrelevant). 

Definition 5.1. A flow (or s,t-flow) in G is a function / : V x V -> R that satisfies the 
following three properties: 

1. Capacity constraint: For all u, v e V, f(u,v) < c(u,v). 

2. Skew symmetry: For all u, v <G V, f(u,v) = —f(v,u). 

3. Flow conservation: For every «eV\{s,r},we have 

L/(",v)=0. 

vev 

The quantity f(u, v) can be interpreted as the net flow from u to v (which can be positive, 
zero or negative). The capacity constraint ensures that the flow value of an edge does not 
exceed the capacity of the edge. Note that skew symmetry expresses that the net flow 
f(u,v) that is sent from u to v is equal to the net flow /(v, u) = —f(u,v) that is sent from 
v to u. Also the total net flow from u to itself is zero because f(u,u) — —f(u,u) — 0. The 
flow conservation constraints make sure that the total flow out of a node u £ V \ {s,t} is 
zero. Because of skew symmetry, this is equivalent to stating that the total flow into u is 
zero. 

Another way of interpreting the flow conservation constraints is that the total positive net 
flow entering a node u e V \ {s,t} is equal to the total positive net flow leaving u, i.e., 

veV:f(v,u)>0 veV:f(u,v)>Q 

The value \ f\ of a flow / refers to the total net flow out of s (which by the flow conserva- 
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Figure 5: On the left: Input graph G with capacities c : E — >• M. + . Only the edges with 
positive capacity are shown. On the right: A flow / of G with flow value |/| = 19. Only 
the edges with positive net flow are shown. 

tion constraints is the same as the total flow into t ): 

l/l = £/(*, v). 

vev 

An example of a network and a flow is given in Figure 5 . 

The maximum flow reads as follows: 
Maximum Flow Problem: 

Given: A directed graph G = (V,E) with capacities c : E — > R + , a source node 

s 6 V and a destination node t € V". 
Goal: Compute an s, t -flow / of maximum value. 

We introduce some more notation. Given two sets X,Y C V, define 

/(*,r) = E£/(*oO- 

xeXyeY 

We state a few properties. (You should try to convince yourself that these properties hold 
true.) 

Proposition 5.1. Let f be a flow in G. Then the following holds true: 

1. For every X CV, f(X,X) = 0. 

2. For every X,Y C V, f(X,Y) = -f(Y,X). 

3. For every X,Y,Z C V with X D Y = 0, 

f(XUY,Z)=f(X,Z)+f(Y,Z) and f(Z,XUY)=f(Z,X)+f(Z,Y). 

5.2 Residual Graph and Augmenting Paths 

Consider an i,f-flow /. Let the residual capacity of an edge e = (u,v) E E with respect 
to / be defined as 

r f (u,v) =c(m,v)-/(h,v). 
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1 1 



Figure 6: On the left: Residual graph Gf with respect to the flow / given in Figure 5. An 
augmenting path P with rr (P) = 4 is indicated in bold. On the right: The flow /' obtained 
from / by augmenting r/(P) units of flow along P. Only the edges with positive flow are 
shown. The flow value of /' is \f'\ = 23. (Note that /' is optimal because the cut (X,X) 
of G withX = {q,t} has capacity c(X,X) = 23.) 

Intuitively, the residual capacity r/(u,v) is the amount of flow that can additionally be 
sent from u to v without exceeding the capacity c(u,v). Call an edge e££a residual 
edge if it has positive residual capacity and a saturated edge otherwise. The residual 
graph Gf — (V,Ef) with respect to / is the subgraph of G whose edge set Ef consists of 
all residual edges, i.e., 

E f = {eeE | r f (e)>0}. 
See Figure 6 (left) for an example. 

Lemma 5.1. Let f be a flow in G. Let g be a flow in the residual graph Gf respecting 
the residual capacities rr. Then the combined flow h = f + g is a flow in G with value 
\h\ = \f\ + \g\. 

Proof. We show that all properties of Definition 5. 1 are satisfied. 
First, h satisfies the skew symmetry property because for every u, v € V 

h(u,v) =f(u,v)+g(u,v) = -(f(v,u) +g(v,u)) = -h(v,u). 

Second, observe that for every u,v E V, g(u,v) < rAu,v) and thus 

h(u,v) =f(u,v)+g(u,v) <f(u,v) + r f (u,v) = f(u,v) + (c(u,v) - f(u,v)) =c(u,v). 
That is, the capacity constraints are satisfied. 
Finally, we have for every u £ V \ {s,t} 

vev vGV vev 

and thus flow conservation is satisfied too. 
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Similarly, we can show that 



\h\ = £ h(s,v) = £ /(,, v) + £ g(s,v) = \f\ + \g\. 



vev vev veV 



□ 



An augmenting path is a simple s,f-path P in the residual graph Gf. Let P be an aug- 
menting path in Gf. All edges of P are residual edges. Thus, there exists some x > such 
that we can send x flow units additionally along P without exceeding the capacity of any 
edge. In fact, we can choose x to be as large as the residual capacity ry(P) of P which is 
defined as 



Note that if we increase the flow of an edge (u,v) E P by x = rAP), then we also have 
to decrease the flow value on (v, u) by x because of the skew symmetry property. We will 
also say that we augment the flow f along path P. See Figure 6 for an example. 

Lemma 5.2. Let f be a flow in G and let P be an augmenting path in Gf. Then f : 
V x V -> R with 



is a flow in G of value \f'\ = \f\ + rAP). 

Proof. Observe that f can be decomposed into the original flow / and a flow fp that 
sends rf(P) units of flow along P and —rf(P) flow units along the reversed path of P, i.e., 
the path that we obtain from P if we reverse the direction of every edge e € P. Clearly, fp 
is a flow in Gf of value rt{P). By Lemma 5.1, the combined flow /' = f + fp is a flow in 
Gofvalue|/'| = |/l □ 

5.3 Ford-Fulkerson Algorithm 

The observations above already suggest a first algorithm for the max-flow problem: Ini- 
tialize / to be the zero flow, i.e., f(u,v) = for all u, v € V. Let Gf be the residual graph 
with respect to /. If there exists an augmenting path P in the residual graph Gf, then 
augment / along P and repeat; otherwise terminate. This algorithm is also known as the 
Ford-Fulkerson algorithm and is summarized in Algorithm 7. 

Note that it is not clear that the algorithm terminates nor that the computed flow is of 
maximum value. The correctness of the algorithm will follow from the max-cut min-flow 
theorem (Theorem 5.3) discussed in the next section. 

The running time of the algorithm depends on the number of iterations that we need 
to perform. Every single iteration can be implemented to run in time 0(m). If all edge 
capacities are integral, then it is easy to see that after each iteration the flow value increases 
by at least one. The total number of iterations is therefore at most \f*\, where /* is a 



rf(P) — min{r/'(«, v) | e£P}. 




f(u,v) + r f (P) if(u,v)eP 
f{u,v)-r f (P) if(v,u)eP 



u,v) otherwise 
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Input: directed graph G = (V,E), capacity function c : E — » ' 


J + , source and 


destination nodes s,t € V 




Output: maximum flow / : V x V — > R 




1 Initialize: f(u, v) = for every u, v € V 




2 while f/iere ex/sto an augmenting path P in Gf do 




3 | augment flow / along P 




4 end 




5 return/ 





Algorithm 7: Ford-Fulkerson algorithm for the max-flow problem. 



maximum flow. Note that we can also handle the case when each capacity is a rational 
number by scaling all capacities by a suitable integer D. However, note that the worst case 
running time of the Ford-Fulkerson algorithm can be prohibitively large. An instance on 
which the algorithm admits a bad running time is given in Figure 7. 

Theorem 5.1. The Ford-Fulkerson algorithm solves the max-flow problem with integer 
capacities in time 0{m\f*\), where f* is a maximum flow. 

Ford and Fulkerson gave an instance of the max-flow problem that shows that for irrational 
capacities the algorithm might fail to terminate. 

Note that if all capacities are integers then the algorithm maintains an integral flow. That 
is, the Ford-Fulkerson also gives an algorithmic proof of the following integrality prop- 
erty. 

Theorem 5.2 (Integrality property). If all capacities are integral, then there is an integer 
maximum flow. 

5.4 Max-Flow Min-Cut Theorem 

A cut (or s, t-cut) of G is a partition of the node set V into two sets: X and X = V \ X such 
that s GX and t GX. Recall that G is directed. Thus, there are two types of edges crossing 
the cut (X,X), namely the ones that leave X and the ones that enter X. As for flows, it 
will be convenient to define for X, Y C V, 

c(X,Y)=Y< I>(M- 

The capacity of a cut (X,X) is defined as the total capacity c(X,X) of the edges leaving 
X. Fix an arbitrary flow / in G and consider an an arbitrary cut (X,X) of G. The total net 
flow leaving X is |/|. 

Lemma 5.3. Let f be a flow and let (X,X) be a cut of G. Then the net flow leaving X is 
f(X,X) = \f\. 
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Figure 7: A bad instance for the Ford-Fulkerson algorithm (left). Suppose that B is a 
large integer. The algorithm alternately augments one unit of flow along the two paths 
(s,u,p,t) and (s,p,u,t). The flow after two augmentations is shown on the right. The 
algorithm needs 2B augmentations to find a maximum flow. 

Proof. 

f(X,V\X)=f(X,V)-f(X,X)=f(X,V)=f(s,V)+f(X-s,V)=f(s,V) = |/|. 

□ 

Intuitively, it is clear that if we consider an arbitrary cut (X,X) of G, then the total flow 
f(X,X) that leaves X is at most c(X,X). The next lemma shows this formally. 

Lemma 5.4. The flow value of any flow f in G is at most the capacity of any cut (X,X) 
ofG,i.e.,f(X,X)<c{X,X). 

Proof. By Lemma 5.3, we have 

|/| =/(*,*) = £ £ f{u,v) <££ c{u,v) = c(x,X). 

uex v ev\x uex v ev\x 

□ 

A fundamental result for flows is that the value of a maximum flow is equal to the mini- 
mum capacity of a cut. 

Theorem 5.3 (Max-Flow Min-Cut Theorem). Let f be a flow in G. Then the following 
conditions are equivalent: 

1. f is a maximum flow ofG. 

2. The residual graph Gf contains no augmenting path. 

3. |/| = c(X,X)for some cut (X,X) of G. 

Proof. (1) =>■ (2): Suppose for the sake of contradiction that / is a maximum flow of G 
and there is an augmenting path P in Gf. By Lemma 5.2, we can augment / along P and 
obtain a flow of value strictly larger than |/|, which is a contradiction. 

(2) => (3): Suppose that Gf contains no augmenting path. Let X be the set of nodes that 
are reachable from s in Gf. Note that t ^ X because there is no path from s to t in Gf. 
That is, (X,X) is a cut of G. By Lemma 5.3, |/| = f(X,X). Moreover, for every u £ X 
and v £ X, we must have f(u,v) — c(u,v) because otherwise (u,v) £ Ef and v would be 
part of X. We conclude |/| = f(X,X) = c(X,X). 
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(3) => (1): By Lemma 5.4, the value of any flow is at most the capacity of any cut. The 
condition |/| = c(X,X) thus implies that /is a maximum flow. (Also note that this implies 
that (X,X) must be a cut of minimum capacity.) □ 



5.5 Edmonds- Karp Algorithm 

The Edmonds-Karp Algorithm works almost identical to the Ford-Fulkerson algorithm. 
The only difference is that it chooses in each iteration a shortest augmenting path in 
the residual graph Gf. An augmenting path is a shortest augmenting path if it has the 
minimum number of edges among all augmenting paths in Gf. The algorithm is given in 
Algorithm 8. 



Input: directed graph G — (V,E), capacity function c : E — K 


l + , source and 




destination nodes s,t G V 




Output: maximum flow / : V x V — > M 




l Initialize: f(u,v) = for every u,v € V 




2 while there exists an augmenting path in Gf do 




3 


determine a shortest augmenting path P in Gf 




4 


augment / along P 




s end 




6 return/ 





Algorithm 8: Edmonds-Karp algorithm for the max-flow problem. 



Note that each iteration can still be implemented to run in time 0(m). A shortest augment- 
ing path in Gf can be found by using a breadth-first search algorithm. As we will show, 
this small change makes a big difference in terms of the running time of the algorithm. 

Theorem 5.4. The Edmonds-Karp algorithm solves the max-flow problem in time 
0(nm 2 ). 

Note that the correctness of the Edmonds-Karp algorithm follows from the max-flow 
min-cut theorem: The algorithm halts when there is not augmenting path in Gf. By 
Theorem 5.3, the resulting flow is a maximum flow. It remains to show that the algorithm 
terminates after 0(nm) iterations. The crucial insight in order to prove this is that the 
shortest path distance of a node can only increase as the algorithm progresses. 

Fix an arbitrary iteration of the algorithm. Let / be the flow at the beginning of the itera- 
tion and let /' be the flow at the end of the iteration. We obtain /' from / by augmenting 
/ along an augmenting path P in Gf. Further, P must be a shortest augmenting path in 
Gf. Let P = (s = Vo, • • • 1 v k = *)■ We define two distance functions (in terms of number of 
edges on a path): Let 8 (k, v) be the number of edges on a shortest path from u to v in Gf. 
Similarly, let 8'{u, v) be the number of edges on a shortest path from u to v in Gf. 

Note that 5(s,v,) = i. Also observe that if an edge (u,v) is part of Gf but not part 
of Gf, then u = V; and v = v,_i for some i. To see this observe that f(u,v) = c(u,v) 
because (u, v) ^ Ef. On the other hand, f'(u,v) < c(u,v) because (u,v) £ Ep. That is, by 
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augmenting / along P, the flow on edge (u,v) was decreased. That means that the flow 
on the reverse edge (v, u) was increased and thus (v, u) must be part of P. 

The next lemma shows that for every node v G V, the shortest path distance from s to v 
does not decrease. 

Lemma 5.5. For each v <EV, S'(s,v) > 8(s,v). 

Proof. Suppose there exists a node v with 8'(s,v) < 8(s,v). Among all such nodes, let 
v be one with 8'(s,v) being smallest. Note that v ^ s because 8(s, s) = 8'(s,s) = by 
definition. Let P' be a shortest 5, v-path in Gf of distance 5'(i,v) and let u be the second- 
last node of P' . Because P' is a shortest path and by the choice of v, we have 

50, v) > 8'(s,v) = 8'{s,u) + \> 8{s,u) + \. (5) 

Note that the distance function 8 satisfies the triangle inequality. Therefore, edge (u,v) 
cannot be part of Gf because otherwise we would have 8 (s,v) < 8(s,u) + l. That is, (u,v) 
is contained in Gp but not contained in Gf. Using our observation above, we conclude 
that there is some i (1 < i < k) such that u — v ( and v = v,-_i. But then 8(s,v) = i — 1 and 
8(s, u) = i which is a contradiction to (5). □ 

Consider an augmentation of the current flow / along path P. We say that an edge e = 
(u,v) is critical with respect to / and P if (w,v) is part of the augmenting path P and its 
residual capacity coincides with the amount of flow that is pushed along P, i.e., rf(u, v) = 
rf(P). Note that after the augmentation of / along P, every critical edge on P will be 
saturated and thus vanishes from the residual network. 

Lemma 5.6. The number of times an edge e — (u,v) is critical throughout the execution 
of the algorithm is bounded by 0(n). 

Proof. Suppose edge e = (u,v) is critical with respect to flow / and path P. Let 8 refer 
to the shortest path distances in Gf. We have 

8(s, v) — 8(s,u) + 1. 

After the augmentation of / along P, edge e is saturated and thus disappears from the 
residual graph. It can only reappear in the residual graph when in a successive iteration 
some positive flow is pushed over the reverse edge (v, u). Suppose edge (v, u) is part of an 
augmenting path P' that is used to augment the current flow, say /'. Let 8' be the distance 
function with respect to Gf. We have 

8'(s,u) = 8 (s,v) + 1. 

By Lemma 5.5, 8(s, v) < 8'(s, v) and thus 

8'(s ) u) = 8'(s ) v) + l > 8(s,v) + l = 8(s,u)+2. 

That is, between any two augmentations for which edge e = (u, v) is critical, the distance 
of u from s must increases by at least 2. 
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Note that the distance of u from s is at least initially and can never be more than n — 2. 
The number of times edge e = (u,v) is critical is thus bounded by 0(n). □ 



The proof of Theorem 5.4 now follows trivially: 

Proof of Theorem 5.4. As argued above, the Edmonds-Karp algorithm computes a maxi- 
mum flow if it terminates. Note that in every iteration of the algorithm at least one edge 
is critical. By Lemma 5.6, every edge is at most 0(n) times critical. The number of 
iterations is thus bounded by 0(nm). □ 

References 

The presentation of the material in this section is based on [3, Chapter 27] and [2, Chapter 
3] 
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6. Minimum Cost Flows 



6.1 Introduction 

We consider the minimum cost flow problem. 

Minimum Cost Flow Problem: 

Given: A directed graph G = (V,E) with capacities w : E — > R + and costs c : 

E -» R+ and a balance function : V -» R. 
Goal: Compute a feasible flow / such that the overall cost L eG £c(e)/(e) is 

minimized. 

Here, a flow f:E—¥ M + is said to be feasible if it respects the capacity constraints and the 
total flow at every node u e V is equal to the balance b(u). More formally, / is feasible if 
the following two conditions are satisfied: 

1. Capacity constraint: for every (u,v) e E, f(u,v) < w(u,v). 

2. Flow balance constraints: for every u e V, 

E f( u > v )- E f(v,u)=b(u). 

( u ,v)eE (v,u)eE 

Intuitively, a positive balance indicates that node u has a supply of b(u) units of flow, 
while a negative balance indicates that node u has a demand of —b(u) units of flow. A 
feasible flow / that satisfies the flow balance constraints with b(u) = for every u e V is 
called a circulation. 

The minimum cost flow problem can naturally be formulated as a linear program: 
minimize E c i e )f{ e ) 

eeE 

subject to E f( u , v )- E /( V ' M ) = H M ) VmgV 

(u,v)eE (v,u)eE ^ 

/(e) < w{e) Ve€£ 

/(e) > Vee£ 

We use c(f) = Y^eeE c ( e )f( e ) t0 re f er t0 me tota l cost °f a feasible flow /. We make a few 
assumptions throughout this section: 

Assumption 6.1. Capacities, costs and balances are integral. 

Note that we can enforce this assumption if all input numbers are rational numbers by 
multiplying by a suitably large constant. 

Assumption 6.2. The balance function satisfies L H ev^( M ) = and there is a feasible 
flow satisfying these balances. 

Note that we can test whether a feasible flow exists by a single max-fiow computation 
as follows: Augment the network by adding a super-source s and a super-target t. Add 
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(a) 



(b) 



Figure 8: (a) Minimum cost flow instance. Every edge (u,v) is labeled with c(u,v),w(u,v) 
and every node u is labeled with b(u). (b) Augmented network to test feasibility. Every 
edge (u,v) is labeled with w{u,v). 

an edge (s,u) for every node u e V with b(u) > of capacity w(s,u) = b(u). Similarly, 
add an edge (u,t) for every node «eV with b(u) < of capacity w(u,t) — —b(u). Now, 
compute a maximum i,f-flow in the augmented network. It is not hard to see that there 
is a feasible flow for the original instance if and only if the maximum flow saturates all 
edges out of s (or, equivalently, into f). 

Subsequently, let W = max^ w(e) refer to the maximum capacity of an edge, C = 
max ee £ c(e) to the maximum edge cost and B = max„ G y b(u) to the maximum balance. 

6.2 Flow Decomposition and Residual Graph 

We establish a few basic properties of flows and circulations and introduce the important 
concept of residual graphs. 

Lemma 6.1. Let f be circulation of G. Then f can be decomposed into at most m=\E\ 
directed simple cycle flows. 

Proof. Let G^ = (V,E^) be the support subgraph of G that contains all edges with posi- 
tive flow value with respect to /, i.e., 

E+ = {e€E | f(e) > 0}. 

Consider an arbitrary directed simple cycle C in Gy. Let x be the smallest flow value of 
an edge in C, i.e., x = min eG c f{e)- We can decompose / such that f = f + fc, where 
fc(u,v) = x for every (u, v) € C and fc(u,v) — otherwise. Note that /' is a circulation 
and fc is a cycle flow. We can now repeat this procedure with /' instead of /. Note that 
at least one edge e of C must satisfy /'(e) = and thus vanishes from the support graph 
of /'. After at most m iterations, we therefore obtain a decomposition of / consisting of 
at most m directed simple cycle flows. □ 

As for the maximum flow problem, the concept of a residual graph will play a crucial 
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role: Suppose / is a feasible flow of G. We introduce for each edge e = (u,v) EE the 
reverse edge (v,u) with cost c(v,u) = —c(u,v). Subsequently, these edges will be called 
backward edges. In contrast, we refer to the original edges (m,v) E E as forward edges. 
The residual capacity of a forward edge (u,v) is defined as rf(u,v) — w(u,v) — f(u,v). 
The residual capacity of a backward edge (v,u) is rf(y,u) = f(u,v). The residual graph 
Gf = (V,Ef) with respect to / is the graph that contains all edges with positive residual 
capacity. 

Consider a directed simple cycle C in the residual graph Gf. Let the residual capacity of 
C be r/(C) = mm eeC r/(e). We can then push x = rf(C) additional units of flow along C 
to obtain a feasible flow /'. Observe that an increase of x units on a backward edge (v, u) 
corresponds to a decrease of x units on the forward edge (u,v) EE. More formally, the 
flow /' that we obtain from / by augmenting x units of flow along C is defined as follows: 
for every edge e = (u,v) EE, we have 

!f(u,v)+x if(«,v)eC 
f{u,v)-x if(v,w)eC 
f(u,v) otherwise. 

Let the total cost of a cycle C in Gf be c(C) = Leec c ( e )- 

Lemma 6.2. Let f be a feasible flow of G and let C be a directed simple cycle in Gf. 
Suppose f is a flow that is obtained from f by augmenting x = r/(C) units of flow along 
C. Then f is a feasible flow ofG. Moreover, we have c(f') = c(f) +x ■ c(C). 

Proof. Observe that x < rAu,v) for every edge (k,v) E C. If (u,v) E C is a forward 
edge, then f(u,v) — f(u,v) +x < f(u,v) + w(u,v) — f(u,v) < w(u,v). If (v,w) E C is 
a backward edge, then f'(u,v) — f(u,v) — x > f(u,v) — f(u,v) = 0. The new flow /' 
therefore respects the capacity and non-negativity constraints. 

Note that by pushing x units of flow along C, the flow at a node u that is not part of C 
remains the same. Consider a node u that is part of C. Because C is simple there are 
exactly two edges of C incident to u, say e\ and e2 ■ Note that the flow on all other edges 
incident to u remains the same. Also, the flow at u remains the same by pushing x units of 
flow along e\ and e^. (Note that in order to verify this we need to consider four different 
cases, depending on whether e\ and ei are forward or backward edges.) We therefore 
have 

E f'( u ' v )- E /'( v ' m )= E f( u ' v )- E f( v ' u )= b ( u )- 

(u,v)EE (v,u)EE (u,v)eE (v,u)eE 

The flow balance constraints are therefore satisfied. 

Finally, observe that by pushing x units of flow along C we effectively increase the cost 
of the flow by x ■ c(u, v) for every forward edge (u,v) eCC\E and decrease the cost of the 
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flow by x ■ c(u,v) for every backward edge (v, u) £C\E. The total cost of /' is 

c(f')=Y J c{e)f'(e) = Y^c{e)f{e)+ £ x-c(u,v)- £ x-c(u,v) 

eeE eeE (u,v)eCnE (v,«)eC\£ 

= C (/)+x£c( e )= C (/)+x- C (C). 

□ 

We can generalize the above lemma as follows: 

Lemma 6.3. Let f be a feasible flow of G and let g be a circulation of Gf that respects 
the residual capacities rj. Then f can be obtained from f by augmenting along k simple 
cycles Ci,. . . ,Ck with k < 2m such that c(f') = c(f) + c(g), where c(g) = Y,eeE f c ( e )g( e ) 
is the total cost of g in the residual graph Gf. 

Proof. Using Lemma 6. 1 , we can decompose g into at most k directed simple cycle flows 
/c, , . . . ,fc k with k < 2m. (Recall that Gf has at most 2m edges.) Each cycle C\ cor- 
responds to a directed cycle in Gf. By pushing /q units of flow along every cycle C, 
(1 < i < k), we obtain a new flow f ? From Lemma 6.2 it follows that f is a feasible 
flow of G of total cost 

c(f') = c(f) + £/q-c(Q) = c(f) + £ c(e)g(e) = c(f)+c(g). 

1=1 

□ 

Lemma 6.4. Lef / anc/ /' be two feasible flows of G. Then f can be obtained from f by 
augmenting flow along at most m cycles in Gf. 

Proof. Consider the difference h = f — f. Let E + be the set of edges (u,v) £ E with 
h(u,v) > 0. Similarly, let E~ be the set of edges (u, v) £ E with h(u,v) < 0. Define a flow 
g as follows: g(u,v) = h(u,v) for every edge E + and g(v,u) — —h(u,v) for every edge 
(u,v) £ E~ . We claim that g is a circulation in Gf. 

Note that for every edge (u,v) £ E + we have < g(u,v) = h(u,v) = f'(u,v) —f(u,v) < 
w(u,v) — f(u,v) = rr(u,v). Thus, (u,v) £ Ef and g(u,v) < rAu,v). Similarly, for every 
edge (u,v) £ E~ we have < g(v,u) = —h(u,v) = f(u,v) — f'(u,v) < f(u,v) = rf(v,u). 
Thus, (v, u) £ Ef and g(y, u) < rf(v, u). The flow g therefore respects the residual capaci- 
ties of Gf. 

It remains to be shown that g satisfies the flow balance constraints: Note that because both 
f and / are feasible flows in G we have for every node u £ V 

L h(u,v)- £ h(v,u)=0. 

[u,v)eE (v,u)gE 
2 We slightly abuse notation here and let fc- also refer to the flow value that is pushed along Q. 
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Using this and the definition of h, we obtain 

0= Y 8{u,v)- Y S(v,u)- Y S(v,u)+ Y S(u,v) 

(w,v)e£+ (u,v)ee- (v,u)eE+ (v,u)eE- 

= Y 8(",v)- Y 8(v,u)- Y 8(v,u)+ Y S(u,v) 

(u,v)GEf (y,u)EEf (y,u)EEf (u,v)GEf 

= 2 ( Y g( u > v )- Y g( v i u ))- 

(u,v)eEf (v,u)eEt 

Here the second equality follows from the observations above: for every (u,v) G E + we 
have (u, v) g Er, for every (u, v) e E~ we have (v, u) £ Ef, and g(u, v) is non-zero only 
on the edges in Zs + and E~ . We conclude that g is a circulation of G/. 

The proof now follows from Lemma 6.3. (Note that there are at most m edges with 
positive flow in g. Thus, g can be decomposed into at most m cycles flows.) □ 

6.3 Cycle Canceling Algorithm 

Lemma 6.2 shows that if we are able to find a cycle C in G/ of negative cost c(C) < 0, 
then we can augment ?y(C) units of flow along this cycle and obtain a flow f of cost 
strictly smaller than c(f). This observation gives rise to our first optimality condition: 

Theorem 6.1 (Negative cycle optimality condition). A feasible flow f ofG is a minimum 
cost flow if and only if Gf does not contain a negative cost cycle. 

Proof. Suppose / is a minimum cost flow and Gf contains a negative cost cycle C. By 
Lemma 6.2, we can augment ry(C) units of flow along C and obtain a feasible flow /' 
with 

c(f') = c(f)+r f (C)-c(C)<c(f), 

which is a contradiction. 

Next suppose that / is a feasible flow and Gf contains no negative cycle. Let /* be a 
minimum cost flow and assume /* ^ f. By Lemma 6.4, f* can be obtained from / by 
augmenting along k cycles Ci, . . . ,Q in Gf, where k < m. Let /c, be the flow that is 
pushed along Q. By Lemma 6.3, the cost of /* is equal to 

c(f) = c(f) + tfc i -c(Q). 

By assumption, each such cycle has nonnegative cost and thus c(/*) > c(f). We conclude 
that / is a minimum cost flow. □ 

This leads to our first algorithm: 

Note that we can establish a feasible flow / by computing a maximum flow as explained 
above. This takes 0{nm 2 ) using the Edmonds-Karp algorithm. Also observe that in each 
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Input: directed graph G = (V,E), capacity function w : E — > R + , cost function 

c : E — > R + and balance function b : V — > M. 
Output: minimum cost flow / : E —> M. + . 

1 Initialize: compute a feasible flow / 

2 while Gf contains a negative cost cycle do 

3 find a directed simple negative cycle C of Gf 

4 push rr (C) flow units along C and let / be the new flow 

5 end 

6 return/ 

Algorithm 9: Cycle canceling algorithm. 

iteration we have to determine a cycle of negative cost in Gf. Using the Bellman-Ford 
algorithm, this can be done in time 0(nm). 

We next bound the number of iterations that the algorithm needs to compute a minimum 
cost flow. Note that an arbitrary flow has cost at most mWC because every edge has flow 
value at most W and cost at most C. On the other hand, a trivial lower bound on the cost 
of a minimum cost flow is 0, because all edge costs are nonnegative. Every iteration of 
the above algorithm strictly decreases the cost of the current flow /. Since we assume 
that all input data is integral, the cost of / decreases by at least 1 . The algorithm therefore 
terminates after at most 0(mWC) iterations. 

Theorem 6.2. The cycle canceling algorithm computes a minimum cost flow in time 
0(nm 2 WC). 

Note that the running time of the cycle canceling algorithm is not polynomial because W 
and C might be exponential in n and m. Algorithms whose running time is polynomial 
in the input size (here n and m) and the magnitude of the largest number in the instance 
(here W and C) are said to have pseudo-polynomial running time. 

As a byproduct, the cycle canceling algorithm shows that there always exists a minimum 
cost flow that is integral if all capacities and balances are integral (see Assumption 6.1). 

Theorem 6.3 (Integrality property). If all capacities and balances are integral, then there 
is an integer minimum cost flow. 

Proof. The proof is by induction on the number of iterations. We can assume without loss 
of generality that the flow / after the initialization is integral: Recall that / is obtained 
by computing a maximum flow in an augmented network as indicated above. Because all 
capacities and balances are integral, this augmented network has integral capacities. The 
resulting flow is therefore integral by Theorem 5.2. Suppose that the current flow / is 
integral after i iterations. The residual capacities in Gf are then also integral and a push 
along an augmenting cycle maintains the integrality of the resulting flow. □ 

We remark that the above algorithm can be turned into a polynomial-time algorithm if in 
each iterations one augments along a minimum mean cost cycle, i.e., a cycle that mini- 
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mizes the ratio c(C)/\C\. A minimum mean cost cycle can be computed in time 0{nm). 
Using this idea, one can show that the resulting algorithm has an overall running time of 
0(n 2 m 3 logn). 

6.4 Successive Shortest Path Algorithm 

We derive an alternative optimality condition. Suppose we associate a potential n(u) with 
every node u £ V. Define the reduced cost c n {u,v) of an edge (u,v) as 

c n (u, v) = c(u,v) — n(u) + 7c(v). 

Note that this definition is applicable to both the original network and the residual graph. 

Theorem 6.4 (Reduced cost optimality conditions). A feasible flow f ofG is a minimum 
cost flow if and only if there exist some node potentials % : V —¥ R such that c K (u, v) > 
for every edge (w,v) £ Ef ofGf. 

Proof. Suppose that there exist node potentials such that c K (u,v) > for every edge 
(w,v) € Ef of the residual graph Gf. Let C be an arbitrary simple directed cycle in Gf. 
Then 

£c(e) = £c«(e)>0. 

We conclude that Gf does not contain any negative cost cycle. By Theorem 6.1, / is a 
minimum cost flow. 

Let / be a minimum cost flow. By Theorem 6.1, Gf contains no negative cycle. Let 
8 : V — > M be the shortest path distances from an arbitrarily chosen source node s E V 
to every other node u £ V (with respect to c). Note that 8 is well-defined because Gf 
contains no negative cycles. The distance function 8 must satisfy the triangle inequality 
(see Lemma 4.2), i.e., for every edge (u,v) € Ef, 5(v) < 8(u) +c(u,v). Define n(u) = 
— 8(u) for every node u £ V. With this definition, we have for every (u,v) G Ef: 

c n (u,v) — c(u,v) — n(u) + n(v) = c(u,v) + 8(u) — 5(v) > 0, 

which concludes the proof. □ 

We next introduce the notion of a pseudoflow. A pseudoflow x of G is a function x : E — >• 
R + that satisfies the nonnegativity and capacity constraints; it need not satisfy the flow 
balance constraints. Given a pseudoflow x, define the excess of a node u £ V as 

exs(u) = b(u) + x(v,u)— ^ x(u,v), 

(v,u)eE (u.v)EE 

Intuitively, exs(u) > means that node u has an excess of exs(u) units of flow; exs(u) < 
means that node u has a deficit of — exs(u) units of flow. We refer to such nodes as excess 
and deficit nodes, respectively. A node u with exs(u) = is said to be balanced. Let 
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V x and V x , respectively, be the sets of excess and deficit nodes with respect to x. (For 
notational convenience, we will omit the subscript x subsequently.) Observe that 

exs(u) = b{u) — and thus ^ exs(u) = — ^ exs{u). (7) 

ugv uev u ev+ uev- 

That is, if the network contains an excess node then it must also contain a deficit node. 

The residual graph G x of a pseudofiow x is defined in the same way as we defined the 
residual graph of a flow. 

Lemma 6.5. Suppose that a pseudofiow x satisfies the reduced cost optimality conditions 
with respect to some node potentials it. Let 8 : V — > R be the shortest path distances from 
some node s G V to all other nodes in G x with respect to c % and define k' = % — 8. The 
following holds: 

1. The pseudofiow x also satisfies the reduced cost optimality conditions with respect 
to the node potentials % '. 

2. The reduced cost c n (u, v) is zero for every edge (w, v) G E x that is part of a shortest 
path from s to some other node in G x . 



Proof. Since x satisfies the reduced cost optimality conditions with respect to it, we have 
c K (u,v) > for every edge (u, v) G E x , Moreover, 8 is a distance function and therefore 
satisfies the triangle inequality, i.e., 8(v) < 8(u)+c n (u,v) for every (u, v) £ E x . Thus, for 
every edge (u,v) £ E x 

c n '(u,v) - c(u,v) - (n(u) - 8(u j) + (n(v) - S(v)) 
= c(u,v) -n(u) + tc(v) + 8(u) -8(v) 
= c n (u,v) + 8{u)-8(v)>0. 

This proves the first part of the lemma. 

Consider a shortest path P from node s to some other node t in G x , Every edge (m, v) e P 
must be tight, i.e., 8(v) = 8(u) +c K (u, v). Substituting c K (u, v) = c(u,v) — 7t(u) + 7t(v), 
we obtain 8(v) = 8(u) +c(u,v) — %(u) + %{v). Thus, 

c n '(u : v) = c(u,v) - n(u) + k(v) + 8{u) - 5(v) = 0, 

which proves the second part of the lemma. □ 

Corollary 6.1. Suppose that a pseudofiow x satisfies the reduced cost optimality condi- 
tions and we obtain x' from x by sending flow along a shortest path P (with respect to c K ) 
from node s to some other node t in G x . Then x' also satisfies the reduced cost optimality 
conditions. 



Proof. Define the potentials n' = n — 8 as in the statement of Lemma 6.5. Then 
c n (u, v) = for every edge («, v) G P. Sending flow along an edge («, v) G P might add the 
reversed edge (v, u) to the residual graph. It is not hard to verify that c 11 (v, u) = —c n (u,v) 
and thus the new edge (v, u) also satisfies the reduced cost optimality condition. The claim 
follows. □ 
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This corollary leads to the following idea: Start with an arbitrary pseudoflow x and poten- 
tials k such that the reduced cost optimality conditions are satisfied. We then repeatedly 
compute a shortest path P from some excess node s E V + to a deficit node t € V~ in G x 
with respect to c K and push the maximum possible amount of flow from s to t along P. 
The shortest path distances are used to update n. The algorithm stops if no further ex- 
cess node exists. Note that by the above corollary the pseudoflow x satisfies the reduced 
cost optimality conditions at all times. Eventually, x becomes a feasible flow. By Theo- 
rem 6.4, x is then a minimum cost flow. The algorithm is summarized in Algorithm 10; 
see Figure 9 for an illustration. 



Input: directed graph G = (V,E), capacity function w : E — > K + , cost function 

c : E — >• M + and balance function b : V — > M.. 
Output: minimum cost flow x : E — > R + . 

1 Initialize: x(u, v) = for every (u, v) E E and n(u) =0 for every u E V 

2 exs(u) = b(u) for every u E V 

3 let V + = {ueV | exs{u) > 0} and V~ = {ueV \ exs{u) < 0} 

4 while V+ ^ do 

5 choose a source node s E V + 

6 compute shortest path distances 8 : V — > K from s to all other nodes u 6 V in 
G r with respect to c 71 

let P be a shortest path from s to some node t E V~ 
update K K — 8 
9 augment A = min{exs(s), — exs(t), r x (P)} units of flow along P 
to update x, G x , exs(s), exs(t), V + , V~ and c K 
n end 
12 return x 



Algorithm 10: Successive shortest path algorithm. 



Theorem 6.5. The successive shortest path algorithm computes a minimum cost flow in 
time 0(nB{m-\-n\ogn)). 



Proof. We show by induction on the number of iterations that the pseudoflow x satisfies 
the reduced cost optimality conditions with respect to %. This is sufficient to establish the 
correctness of the algorithm because the algorithm terminates with V + = V ~ = and the 
final pseudoflow x is thus a flow. It then follows from Theorem 6.4 that x is a minimum 
cost flow. 

After the initialization, x is a pseudoflow and G x = G. Since n{u) = for every u E V, 
c K (u,v) = c(u,v) for every (u,v) E E x . Since all edge costs are assumed to be nonneg- 
ative, x satisfies the reduced cost optimality conditions with respect to it. Let x be the 
pseudoflow at the beginning of iteration i and assume that it satisfies the reduced cost 
optimality conditions with respect to n. The shortest path distances 8 are well-defined 
because G x does not contain a negative cycle with respect to c n . By (7), V + is nonempty 
iff V - is nonempty. The algorithm therefore succeeds in finding a shortest path from s 
to some node t E V~ because otherwise the problem would be infeasible. (Recall that 
we assume that there is a feasible solution; see Assumption 6.2.) By Corollary 6.1, the 
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0,-3 
(e) 



Figure 9: Illustration of the successive shortest path algorithm. The residual graph G x 
with respect to the current pseudoflow x is depicted. Every edge (u,v) is labeled with 
c n (u,v),r x (u,v) and every node u is labeled with exs(u),7l(u). (a) G x with respect to 
x = and n = 0. (b) G x after potential update: two units of flow are sent along the bold 
path, (c) G x after flow augmentation, (d) G x after potential update: two units of flow are 
sent along the bold path, (e) G x after flow augmentation: no further excess/deficit nodes 
exist and the resulting flow is optimal. 
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pseudoflow that we obtain from x by sending A units along P satisfies the reduced cost 
optimality conditions with respect to % — 8. 

It remains to be shown that the algorithm terminates. In each iteration, A is chosen such 
that either s or t become balanced or one of the edge on P becomes saturated. Each 
iteration therefore strictly reduces the excess of the chosen source node s. Since we 
assume that all input data is integral, the excess of s is reduced by at least 1 . The algorithm 
therefore terminates after at most nB iterations. Each iteration requires to solve a single 
source shortest path problem with respect to c K . Because the reduced costs c K are non- 
negative, we can use Dijkstra's algorithm which requires 0(m + n logn) time. The overall 
running time of the successive shortest path algorithm is thus 0(nB(m + n logn)). □ 



6.5 Primal-Dual Algorithm 

We use linear programming duality to derive our third algorithm for the minimum cost 
flow problem. We associate a dual variable n(u) with every node u € V and a(e) with 
every edge e £ E. The dual of the linear program (6) is as follows: 

maximize ^ b(u)n(u) — ^ w(e)a(e) 

uev eeE 
subject to n(u) — n(y) — a(u, v) < c(u,v) V(u,v)eE { - 0) 

a(e) > Vee£ 

As in the previous section, let the reduced cost of an edge (m,v) G E be defined as 
c n (u,v) = c(u,v) — 7z(u) + k(v). The above constraints then require that —a(u,v) < 
c K (u,v) and a(u,v) > for every edge (m,v) G E. Since the dual has a maximization 
objective and because capacities are nonnegative, an optimal solution to (8) satisfies 
Of(w,v) = max{0,— c 7r (M,v)}. In a sense, the dual variable a(u,v) are therefore redun- 
dant: Given optimal dual values n(u) for every u 6 V, we can extend this solution to a 
feasible dual solution (n, a) of (8) using the above relation. 

We next derive the complementary slackness conditions of the primal linear program (6) 
and the dual linear program (8): 

1. Primal complementary slackness condition: for every edge eeE: 

f(e)>0 a(e)=-c*(e), 

which is equivalent to 

/(e) >0 c K (e)<0. 

2. Dual complementary slackness condition: for every edge eeE: 

a(e)>0 => f(e)=w(e), 
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which is equivalent to 

c*(e)<0 => f(e)=w(e). 

Theorem 6.6 (Complementary slackness optimality conditions). A feasible flow f ofG is 
a minimum cost flow if and only if there exist dual values Tl{u) for every u€V satisfying 
that for every edge e € E: 

1. Ifc n (e)>Othenf(e)=0. 

2. Ifc K (e) < then f(e) = w(e). 

Proof. The proof follows directly from the complementary slackness conditions. □ 

The complementary slackness optimality conditions can actually be seen to be equivalent 
to the reduced cost optimality conditions that we introduced earlier: 

Theorem 6.7. A feasible flow f satisfies the reduced cost optimality conditions with re- 
spect to node potentials % : V — > K if and only if f satisfies the complementary slackness 
optimality conditions with respect to 71. 

Proof. Suppose c K (u,v) > for every (u,v) £ Ef. Let (u,v) e E and suppose c K (u, v) < 
0. Then (u,v) ^ Ef and thus f(u,v) = w(u,v). Next suppose c K (u,v) > 0. Because 
c n (y,u) = — c n (u, v) < 0, the backward edge (v,u) is not part of Gf and thus f(u,v) = 0. 

Assume that the complementary slackness conditions are satisfied for every edge (u, v) € 
E. Consider a forward edge (u,v) e Ef. Then f(u,v) < w(u,v) and thus c K (u,v) > 0. 
Next consider a backward edge (v,w) G Ef. Then f(u,v) > and thus c K (u, v) < 0. Since 
c n (v,u) = —c n (u,v), we conclude that c n (v,u) > 0. □ 

The primal-dual algorithm for the minimum cost flow problem follows a general primal- 
dual paradigm: We start with an infeasible primal solution x and a feasible dual solution 
%. We ensure that the algorithm satisfies the complementary slackness conditions with 
respect to x and % throughout the entire execution of the algorithm. The algorithm suc- 
cessively reduces the degree of infeasibility of the primal solution x with respect to the 
current dual solution n. If no further improvement is possible, then % will be updated so 
as to ensure that the infeasibility of x can be further reduced. The dual solution % remains 
feasible throughout the entire process. Eventually, x is a feasible primal solution and thus 
a minimum cost flow. 

The algorithm works with a transformed instance of the problem having exactly one ex- 
cess and one deficit node: Augment the original graph by adding a super-source s and 
a super-targe t. Add an edge (s,u) for every node a£V with b(u) > of capacity 
w(s,u) — b(u) and cost c(s,u) = 0. Similarly, add an edge (u,t) for every node «eV with 
b(u) < of capacity w(u,t) = —b(u) and cost c(u,t) = 0. Let b(s) = LhgV:/)(h)>o^( m ) 
and b(t) = L«gv : &(h)<o^( m )- All other balances are set to zero. Clearly, every minimum 
cost flow in the augmented network corresponds to a minimum cost flow in the original 
network and vice versa. Subsequently, we will use the augmented network. 
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The algorithm starts with the pseudoflow x(e) — for every e G E and dual n(u) = 
for every u G V. Note that x is an infeasible primal solution and % is a feasible dual 
solution. Also x and n satisfy the complementary slackness conditions because for every 
edge e EE, x{e) = and c n {e) — c(e) > 0. 

In order to reduce the infeasibility of x, the algorithm basically pushes as much flow as 
possible from s to t along shortest s, t -paths in G x . Let 8 : V — >• R be the shortest path 
distances from s to all other nodes u G V in G x with respect to c % . Define %' = n — 8. 
Then every shortest s, f-path in G x with respect to c n is a zero cost path with respect to c n 
and vice versa. Let the admissible graph be the subgraph of G x = (V,E X ) that consists 
of all edges e G E x with c K (e) = 0. The algorithm computes a maximum flow g° in G^, 
where the capacities of the edges are their respective residual capacities. We can then 
augment x by g° in the obvious way: Increase the flow value x(u, v) of every forward edge 
(w,v) G £ by g°(w,v) and decrease the flow value x(u,v) of every backward edge (v,w) 
by g (v,k). As a result, the excess at s is reduced by the flow value \g°\. It is not hard 
to see that the resulting flow x' is a pseudoflow. Moreover, in light of Theorem 6.7 and 
Corollary 6. 1 , xf satisfies the complementary slackness conditions with respect to the new 
feasible dual %' . 

The algorithm continues in this manner until eventually the total excess of s is exhausted 
and the pseudoflow x becomes a flow. Since the algorithm maintains the invariant that x 
and 7T satisfy the complementary slackness conditions and % is a feasible dual solution, x 
(and also %) are eventually optimal solutions to the respective linear programs in (6) (and 
(8)). 

The algorithm is summarized in Algorithm 1 1 ; see Figure 10 for an illustration. 



Input: directed graph G = (V,E), capacity function w 


E — > R + , cost function 




c : E — > R + and balance function b : V — > R. 




Output: minimum cost flow x : E — > R + . 




l Initialize: x(u, v) = for every (u,v) G E and n(u) = 


= for every u G V 


2 exs(s) = b(s) 




3 while exs(s) > do 




4 


compute shortest path distances 8 : V — > R from 


s to all other nodes u G V in 




G x with respect to c % 




5 


update K K — 8 




6 


construct the admissible network 




7 


compute a maximum flow g° from s to / in G^ 




8 


augment x by g° 




9 


update x, exs(s), G x and c K 




io end 




li return x 





Algorithm 11: Primal-dual algorithm. 



Theorem 6.8. The primal-dual algorithm computes a minimum cost flow in time 
0(min{nC,nB} -nm 2 ). 
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(e) 



Figure 10: Illustration of the primal-dual algorithm. The residual graph G x with respect 
to the current pseudoflow x is depicted. Every edge (u,v) is labeled with c n (u, v),r x (u,v) 
and every node u is labeled with exs(u), n(u). (a) G x with respect to x = and % = 
of the transformed network of the example instance depicted in Figure 8(a). (b) G x after 
potential update: max flow in G® has value 2. (c) G x after flow augmentation, (d) G x after 
potential update: max flow in G® has value 2. (e) G x after flow augmentation: excess of 
source node s is zero and the resulting flow is optimal. 



49 



Proof. The correctness of the algorithm follows from the discussion above. 

Observe that in each iteration, the excess of s is reduced by at least 1 (assuming integer 
capacities and balances). The maximum number of iterations is thus at most nB, which is 
the maximum excess of s at the beginning of the algorithm. 

We can establish a second bound on the number of iterations: In each iteration, g° is a 
maximum flow in G®. By the max-flow min-cut theorem, there is a cut {X,X) in such 
that for every edge (u,v) with «€X and v € X, g°(u,v) = r x (u,v). As a consequence, 
after the augmentation, all these edges vanish from the residual graph G r / of the new flow 
xf. Thus, every s,f-path in G x i has length at least 1 with respect to c n (w,v) (because edge 
costs are integral). The potential of t therefore reduces by at least 1 in the next iteration. 
Note that no node potential n(u) for u ^ s can ever be less than — nC (think about it!). 
The total number of iterations is therefore bounded by nC. 

The running time of each iteration is dominated by the shortest path and max flow com- 
putations. The total running time is thus at most 0(min{nC,nB} ■ nm 2 ). □ 

References 

The presentation of the material in this section is based on [1, Chapter 9]. 
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Figure 1 1 : Illustration of the existence of an M-augmenting path. 

7. Matchings 

7.1 Introduction 

Recall that a matching M in an undirected graph G = (V,E) is a subset of edges satisfying 
that no two edges share a common endpoint. More formally, MC£isa matching if for 
every two distinct edges (u, v), (x,y) G M we have {u, v} n {x,y} = 0. Every node u E V 
that is incident to a matching edge is said to be matched; all other nodes are said to be 
free. A matching M is perfect if every node u € V is matched by M. 

We consider the following optimization problem: 

Maximum Matching Problem: 

Given: An undirected graph G — (V,E). 

Goal: Compute a matching M C E of G of maximum size. 

Note that if the underlying graph is bipartite, then we can solve the maximum matching 
problem by a maximum flow computation. 

Given two sets S,T C E, let S A T denote the symmetric difference of S and T, i.e., 
SAT = (S\T)U(T\S). 

7.2 Augmenting Paths 

Given a matching M, a path P is called M-alternating (or simply alternating) if the edges 
of P are alternately in M and not in M. If the first and last node of an M-alternating 
path P are free, then P is called an M-augmenting (or augmenting) path. Note that an 
augmenting path must have an odd number of edges. An M-augmenting path P can be 
used to increase the size of M: Simply make every non-matching edge on P a matching 
edge and vice versa. We also say that we augment M along P. 
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Figure 12: Illustration of an alternating tree. The nodes in X and Y are indicated in white 
and gray, respectively. Note that there is an augmenting path from r to v. 

Theorem 7.1. A matching M in a graph G = (V,E) is maximum if and only if there is no 
M-augmenting path. 

Proof. Suppose M is maximum and there is an M-augmenting path P. Then augmenting 
M along P gives a new matching M' = M A P of size |M| + 1, which is a contradiction. 

Suppose that M is not maximum. LetM* be a maximum matching. Consider the symmet- 
ric difference MAM*. Because M and M* are matchings, the subgraph G' = (V,M AM*) 
consists of isolated nodes and node-disjoint paths and cycles. The edges of every such 
path or cycle belong alternately to M and M* . Each cycle therefore has an even number 
of edges. Because |M* | > |M| there must exist one path P that has more edges of M* than 
of M. P is an M-augmenting path; see Figure 1 1 for an illustration. □ 




7.3 Bipartite Graphs 

The above theorem gives an idea how to compute a maximum matching: Start with the 
empty matching M = 0. Find an M-augmenting path P and augment M along P. Repeat 
this procedure until no M-augmenting path exists and M is maximum. 

A natural approach to search for augmenting paths is to iteratively build an alternating 
tree. Suppose M is a matching and r is a free node. We inductively construct a tree T 
rooted at r as follows. We partition the node set of T into two sets X and Y: For every 
node u £ X, there is an even-length alternating path from r to u in T; for every node uGY, 
there is an odd-length alternating path from r to u in T. We start with X = {r} and Y = 
and then iteratively extend T using the following operation: 

Extend tree using (m,v): 

(Precondition: (u, v) G E, u G X, v £ X UY and (v, w) G M) 
Add edge (u,v) to T, v to Y, edge (v, w) to T and w to X 

This way we obtain a layered tree rooted at r (starting with layer 0); see Figure 12 for 
an illustration. All nodes in X are on even layers and all nodes in Y are on odd layers. 
Moreover, every node in layer 2i —l(i> 1) is matched to a node in layer 2i. In particular, 
\X\ = \Y\ + l. 
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Input: undirected bipartite graph G = (V,E). 


Output: maximum matching M. 


1 Initialize: M = 


2 foreach r e V do 


3 


if r is matched then continue 


4 


else 




5 




X = {r},Y =%, T = 


6 




while f/iere ejrisfs an eoge (m, v) € £ vw'fn u £X and v^XUfdo 


7 
8 






if v /i/ree then AUGMENT MATCHING USING (m,v) 

else Extend tree using (w,v) 


9 




end 


10 


end 




ii end 






12 return M 



Algorithm 12: Augmenting path algorithm. 



Suppose that during the extension of the alternating tree T we encounter an edge (u, v) 6 £ 
with «el and v^Xuy being a free node. We have then found an augmenting path from 
r to v; see Figure 12. 

Augment matching using (m,v) 

(Precondition: (u, v) £ E, u e X, v g X U Y free) 

Augment M along the concatenation of the r, M-path in T with edge (u, v) 

These two operations form the basis of the augmeting path algorithm given in Algo- 
rithm 12. 

The correctness of the algorithm depends on whether alternating trees truly capture all 
augmenting paths. Clearly, whenever the algorithm finds an augmenting path starting at 
r, this is an augmenting path. But can we conclude that there is no augmenting path if 
the algorithm does not find one? As it turns out, the algorithm works correctly if the 
underlying graph satisfies the unique label property: A graph satisfies the unique label 
property with respect to a given matching M and a root node r if the above tree building 
procedure uniquely assigns every node u £ V(T) to one of the sets X and Y, irrespective 
of the order in which the nodes are examined. 

Lemma 7.1. Suppose a graph satisfies the unique label property. If there exists an M- 
augmenting path, then the augmenting path algorithm finds it. 

Proof. Let P = (r, v) be an augmenting path with respect to M. Because of the 
unique label property, the algorithm always ends up with adding node u to X and thus 
discovers an augmenting path via edge («, v). □ 

Using the above characterization, we can show that the augmenting path algorithm given 
in Algorithm 12 is correct for bipartite graphs: Recall that in a bipartite graph, the node 
set V is partitioned into two sets Vq and V\. Every node that is part of V (T) and belongs 
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(a) 



(b) 



Figure 13: Illustration of a blossom shrinking, (a) The odd cycle B — (b,p,u,v,q,b) 
constitutes a blossom with base b and stem (r,x,b). Note that there is an augmenting 
path from z to r via edge (u,v). (b) The resulting graph after shrinking blossom B into a 
super-node b. 

to the set Vi with r £ V, is added to X; those that belong to Vi_, are added to Y. Thus 
bipartite graphs satisfy the unique label property. 

Theorem 7.2. The augmenting path algorithm computes a maximum matching in bipar- 
tite graphs in time 0(nm). 



Proof. The correctness of the algorithm follows from the discussion above. Note that each 
iteration can be implemented to run in time 0(n + m) and there are at most n iterations. 

□ 



7.4 General Graphs 

It is not hard to see that graphs do in general not satisfy the unique label property. Consider 
an odd cycle consisting of three edges (r,u), (w,v), (v,r) and suppose that (u,v) £ M and 
r is free. Then the algorithm adds u to Y if it considers edge (r,u) first, while it adds u 
to X if it considers edge (r, v) first. Odd cycles are precisely the objects that cause this 
dilemma (and which are not present in bipartite graphs). 

A deep insight that was first gained by Edmonds in 1965 is that one can "shrink" such odd 
cycles. Suppose during the construction of the alternating tree, the algorithm encounters 
an edge (u,v) with u,v £ X; see Figure 13(a). Let b be the lowest common ancestor of 
u and v in T . Note that b £ X. Consider the cycle B that follows the unique b,u-path in 
T, then edge (u,v) and then the unique v,Z?-path in T. B is an odd length cycle, which 
is also called a blossom. The node b is called the base of B. The even length path from 
b to the root node r is called the stem of B; if r = b then we say that the stem of B is 
empty. Suppose we shrink the cycle B to a super-node, which we identify with b; see 
Figure 13(b). Note that the super-node b belongs to X after shrinking. 

Shrink blossom using (m,v): 

(Precondition: (u,v) £ E and u,v £ X) 

Let b be the lowest common ancestor of u and v in T. 

Shrink the blossom B = (b,...,u,v,...,b) to a super-node b. 



54 



Let G' be the resulting graph and let M' be the restriction of M to the edges of G' . The 
next two lemmas show that by shrinking blossoms, we do not add or lose any augmenting 
paths. 

Lemma 7.2. Suppose there is an M 1 -augmenting path P' from r to v (or the respective 
super-node) in G'. Then there is an M-augmenting path from r to v in G. 

Proof. If P 1 does not involve the super-node b, then P 1 is also an augmenting path in G. 
Suppose P 1 contains the super-node b. There are two cases we need to consider: 

Case 1: r ^ b. Let P 1 = (r, . . . ,x,b,z, ■ ■ ■ ,v) be the augmenting path in G' . Let P' rx and P' 7V 
refer to the subpaths (r, . . . ,x) and (z, . . . , v) of P', respectively. Note that (x,b) e M 1 and 
(b,z) $ M'. If we expand the blossom B corresponding to super-node b, then b is the base 
of B with incident matching edge (x,b). Let p be the node of B such that (p,z) is part of 
G. Then there is an even length M-alternating path Pb p = (b,...,p) from b to p in B. The 
path P = {P' rx , (x,b),Pb p , (p,z),P^ v ) is an M-augmenting path in G. 

Case 2: r = b. Let P 1 = (b,z, ■■■ ,v) be the augmenting path in G'. Let f z ' v refer to the 
subpath (z, . . . , v) of P'. If we expand the blossom B corresponding to super-node b, then 
b is the base of B which is free. Let p be the node of B such that (p,z) is part of G. Then 
there is an even length M-alternating path Pb p = (b,...,p) from b to p in B. The path 
P = {b,Pb pi {p,z),P^,) is an M-augmenting path in G. □ 

Lemma 7.3. Suppose there is an M-augmenting path P from rtov in G. Then there is an 
M' -augmenting path from r to v (or the respective super-nodes) in G'. 

Proof. We assume without loss of generality that r and v are the only free nodes with 
respect to M. (Otherwise, we can remove all other free nodes from G without affecting 
P.) If P has no nodes in common with the nodes of the blossom B, then P is an M 1 - 
augmenting path in G' and we are done. Suppose P = (r, . . . , v) contains some nodes of 
B. We consider two cases: 

Case 1 : The stem of B is empty. The base b of B is then a free node and therefore coincides 
with one of the endpoints of P. Assume that r = b; the other case follows similarly. Let p 
be the last node of P that is part of B and let P pv = (p,z, ■ ■ ■ , v) be the subpath of P starting 
at p. Note that (p,z) ^ M. The path P 1 = (b,z, ■ ■ ■ ,v) is then an M'-augmenting path in 
G'. 

Case 2: The stem of B is non-empty. Let P r b = (r,...,b) be the stem of B. Consider the 
matching M = M A P r b- Then r is matched in M and thus b and v are the only free nodes 
with respect to M. Further, |M| = |M|. Note thatM is not a maximum matching (because 
there is an M-augmenting path in G) and thus also M is not a maximum matching. Thus, 
there is an M-augmenting path P in G that starts at b and ends at v. Note that the stem 
of B with respect to P is empty and we can thus use the proof of Case 1 to show that the 
contracted graph G' contains an M'-augmenting path from b to v. Note that M 1 is different 
fromM'. However, because \M'\ = \M'\ we conclude that G' must contain an augmenting 
path with respect to M 1 as well. □ 



55 



The matching algorithm for general graphs is also known as the blossom- shrinking algo- 
rithm. The algorithm maintains a graph G' of super-nodes and a respective matching M' 
on the super-nodes throughout each iteration. At the end of each iteration, all super-nodes 
of G' are expanded and the matching M on the original graph is obtained from M' as 
described in the proof of Lemma 7.2. 



Input: undirected graph G = (V,E). 


Output 


: maximum matching M. 


l Initialize 


: M = 


2 foreach r e V do 


3 


if r is matched then continue 


4 


else 




5 




G' ^G and M' <- M. 


6 




X <- {r}, Y <- 0, T <- 


7 




while f/iere ejc/ife an e«ge (zt, v) € E' with u EX and v 4 Y do 


8 






if V is free, v^r then AUGMENT MATCHING USING (h,v) 


9 






else if v ^ X U F, (v, w) e M' then Extend tree using (h, v) 


10 






else Shrink blossom using (k,v) 


11 




end 


12 




Extend M' to a matching M of G by expanding all super-nodes of G '. 


13 


end 




14 end 






is return M 



Algorithm 13: Blossom shrinking algorithm. 



Theorem 7.3. The blossom-shrinking algorithm computes a maximum matching in gen- 
eral graphs in time 0(nma(n, j)). 

Proof. The correctness of the algorithm follows from Lemmas 7.2 and 7.3 and The- 
orem 7.1. It remains to show that the algorithm can be implemented to run in time 
0(mu(n,~)) per iteration. The key here is to maintain an implicit representation of 
the graph G' of super-nodes: We keep track of the partition of the original nodes into 
super-nodes by means of a union-find data structure. Considering an edge (w, v) EE dur- 
ing an iteration, we need to check whether edge (u,v) is part of G'. This can be done by 
verifying whether u and v belong to the same set of the partition. Shrinking a blossom is 
tantamount to uniting the node sets of the respective super-nodes. We have at most 2m find 
and n union operations per iterations and these operations take time O (n + ma(n,^)). All 
remaining operations (extending the tree, augmenting the matching, extracting the match- 
ing on G) can be done in time 0(n + m) per iteration. The time bound follows. □ 

There are algorithms with better running times for the matching problem. For the bipartite 
case, Hopcroft and Karp showed that the running time of the augmenting path algorithm 
can be reduced to O(yfnm). The basic idea is to augment the current matching in each 
iteration by a maximal set of node-disjoint shortest paths (in terms of number of edges). 
Using this idea, one can show that the shortest path lengths increase with each iteration. 
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Now, fix an arbitrary matching M and suppose \M\ < k — \fk, where k = \M* \ is the car- 
dinality of a maximum matching. It is not hard to see that then there is an M-augmenting 
path of length at most 2^fk+ 1. That is, after at most 2\fk+ 1 iterations, the algorithm 
has found a matching M of size at least k — \fk. After at most \fk additional iterations, 
the algorithm terminates with a maximum matching. Each iteration can be implemented 
to run in time 0(n + m), which gives a total running time 0{^fkm) = 0{^/nm). A similar 
idea can be used in the general case to obtain an algorithm that computes a maximum 
matching in time O(yfnm). 

References 

The presentation of the material in this section is based on [1, Chapter 12] and [2, Chapter 
5]. 
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Figure 14: (a) Finite point set S. (b) Convex hull conv-hull(S) and a separating inequality 
for v ^ conv-hull{S). 

8. Integrality of Polyhedra 

8.1 Introduction 

Many algorithms for combinatorial optimization problems crucially exploit min-max re- 
lations in order to prove optimality of the computed solution. We have seen examples 
of such algorithms for the maximum flow problem, minimum cost flow problem and the 
matching problem. A question that arises is whether there is a general approach to derive 
such min-max relations. As we will see in this section, such relations can often be derived 
via polyhedral methods. 

8.2 Convex Hulls 

Suppose we are given a finite set S = {s\ , . . . , s^} C R" of n-dimensional vectors. A vector 
x e W is a convex combination of the vectors in S if there exist non-negative scalars 
Ai, . . . , A* with YJi=i h = 1 such that x = £* =1 XiSi. The convex hull of S is defined as the 
set of all convex combinations of vectors in S. Subsequently, we use conv-hull(S) to refer 
to the convex hull of S. 

Suppose we want to solve the following mathematical programming problem: Given 
some w G M", ma\{w T x \ x € S}. Intuitively, it is clear that this is the same as maxi- 
mizing w T x over the convex hull of S. 

Theorem 8.1. Let S C W be a finite set and let w G W. Then 

max{w T x | x <E S} = max{w T x \ x e conv-hull(S)} . 
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Proof. Let x E conv-hull(S) . Then 

w x = X\w T s\ H h X/ ( w T si l < max{w r x | x E 5}. 

Thus max{w r x | x £ conv-hull(S)} < max{w r ji: | x € 5}. Equality now follows because 
5 C conv-hull(S). □ 

The next proposition states that if v E R" \ conv-hull(S) then there must exist a separating 
inequality w T x < t that separates v from conv-hull(S), i.e., < f for all x E conv-hull(S) 
but w r v > t . 

Theorem 8.2. Let S CW be a finite set and let v£l"\ conv-hull(S). Then there is a 
separating inequality w T x < t that separates v from conv-hull(S). 

Proof. Note that verifying whether v e conv-hull(S) is equivalent to checking whether 
there is a solution (Ai , . . . , X^) to the following linear system: 



jSj = V 



7=1 



I>=1 

(=1 

A,>0 ViG{l,...,Jt} 



(9) 

(10) 
(11) 



Conversely, vgl"\ conv-hull(S) iff the above linear system has no solution. Using 
Farkas Lemma (see below) with 



l.i 



S\,n 
\ 1 



Sk,l \ 



i / 




and 



/ vi \ 



V i / 



we obtain that v£l"\ conv-hull(S) iff there exists ayel" and z G M such that 
(/ z)A > and (/ z)A < 0, 

or, equivalently, 

y T Si>-z Vie {I,...,*} 
y r v < — z. 

By setting w = — y and f = z, we obtain that w 1 Sj < t for every i € {1, . . . Theorem 8.1 
implies that w r x < f for every x E conv-hull(S). Moreover, w T v > t, which concludes the 
proof. □ 



We state the following proposition without proof. 
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(a) 



(b) 



Figure 15: (a) Polytope described by five linear inequalities, (b) Faces of the polytope 
(indicated in bold). 

Proposition 8.1 (Farkas Lemma). The system Ax = b has a non-negative solution x > 
if and only if there is no vector y such thaty 1 A > andy T b < 0. 

8.3 Polytopes 

A polyhedron P C W is described by a system of linear inequalities, i.e., P={i£ 
W | Ax < b}. A polyhedron P is a polytope if P is bounded, i.e., there exist l,«€l* 
such that I < x < u for every x e P. 

An inequality w T x < t is called valid for a polyhedron P if P C {x S R" | w r x < f }. A 
hyperplane is given by | w r x = f }. It is called a supporting hyperplane if w r x < f 

is valid for P and P(l{i | w T x = t}^=%. The intersection of a supporting hyperplane 
with P is called a /ace. In the plane, the faces of a polyhedron are the edges and corner 
points of P. 

Lemma 8.1. A non-empty setFQP = {x \ Ax<b} is a face ofP if and only if for some 
subsystem A°x < b° of Ax < b, we have F = {x € P \ A°x = b°}. Moreover, if F is an 
(inclusionwisej minimal face ofP, then the rank of A° is equal to the rank of A. 

A vector v <G P is a vertex of P if {v} is a face of P. A polyhedron P is pointed if it has at 
least one vertex. 

Lemma 8.2. If a polyhedron P is pointed then every minimal non-empty face of P is a 
vertex. 

Lemma 8.3. Let P = {x \ Ax < b} and v € P. Then v is a vertex of P if and only if v 
cannot be written as a convex combination of vectors inP\{v}. 

Proof. Suppose v is a vertex of P and let A°x < b° be a subsystem of Ax < b such that 
{v} = {x e P | A°x = b°}. Suppose v can be written as a convex combination X\xi + 
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h Xk x k °f vectors x\ , . . . E P. Then A°X{ = b° for every i € {1, . . . , k}. But this is a 

contradiction to the assumption that v is the unique solution to the system A°x = b° . 

Conversely, suppose v cannot be written as a convex combination of vectors in P\ {v}. 
Let A°x < b° consist of the inequalities of Ax < b which v satisfies with equality. Let 
F = {x | A°x — b°}. It suffices to show that F = {v}. Suppose that this is not true. Let 
uE F\{v} and consider the line L — {v + X(u — v) | A G R} through u and v. Clearly, L C 
F. For every inequality ape < bi of Ax < b which is not part of A°x < b°, we have aix < bj. 
We can therefore determine a sufficiently small £ > such that v + = v + e(u — v) € P and 
v _ = v — e{u — v) € P. But v = i (v + + v _ ), which is a contradiction. □ 

Theorem 8.3. A polytope is equal to the convex hull of its vertices. 

Proof. Let P be a polytope. Since P is bounded, P must be pointed. Let V = {vi , . . . , Vj} 
be the vertices of P. Clearly, conv-hull(V) C P. It remains to be shown that P C 
conv-hull{V). Suppose there exists some m S P\conv-hull(V). Then by Theorem 8.2, 
there exists an inequality w T x < t that separates u from conv-hull(V), i.e., w r x < t for 
every x g conv-hull(V) and w r « > f . Let t* — max{w T x \ x E P} and consider the face 
F = {x E P | w T x = t*}. Because u E P, we have t* > w T u > t. That is, F contains no 
vertex of P, which is a contradiction. □ 

Theorem 8.4. A set P is a polytope if and only if there exists a finite set V such that P is 
the convex hull ofV. 

The above theorem suggests the following approach to obtain a min-max relation for a 
combinatorial optimization problem. 

1 . Formulate the combinatorial problem II as an optimization problem over a finite 
set S of feasible solutions (e.g., by considering all characteristic vectors). 

2. Determine a linear description of conv-hull[S). 

3. Use duality of linear programming theory to obtain a min-max relation. 

Note that by Theorem 8.1, solving the problem II over S is equivalent to solving the 
problem over conv-hull(S). By Theorem 8.4, there must exist a polyhedral description of 
conv-hull(S). Thus, II can be described as a linear program. Dualizing and using strong 
duality, we can deduce a min-max relation for the problem. 

We remark that the results given above show that the above approach as such is applicable. 
However, there are (at least) two difficulties here: (i) It is not clear how to derive a linear 
description of conv-hull(S) above, (ii) Even though such a description is guaranteed to 
exist, the number of linear inequalities might be by far larger than the size of the original 
problem. That is, even if we are able to come up with such a description, this might not 
lead to a polynomial-time algorithm. 

We exemplify the above approach for the perfect matching problem in bipartite graphs. 
Let G — (V,E) be a bipartite graph. Recall that a matching is perfect if every node of the 
graph is matched. Define PM(G) C R £ as the set of characteristic vectors of the perfect 
matchings of G. 

Theorem 8.5 (Birkhoff 's Theorem). Let G — (V,E) be a bipartite graph. The convex hull 
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ofPM(G) is defined as 



^ x e = 1 Vi/eV 

e= (u,v)GE 

x e > ~ie£E 



(12) 



Proof. Let P be the polytope defined by (12). Clearly, each perfect matching x £ PM(G) 
is contained in P. It suffices to show that all vertices of P are integral. Suppose for the sake 
of contradiction that x is a vertex of P that is not integral. Let E = {e £ E | < x e < 1} 
be the fractional edges of x. Because Y,e=(u,v)eE x e — 1 f° r every node «eV, each node 
incident to an edge in E is incident to at least two edges in E. Thus, there exists a cycle 
C in E. Also, C must be even because G is bipartite. Let d £ M. E be a vector that is for 
all edges not in C and alternately 1 and — 1 for the edges along C. Because all edges of 
C are contained in E, there is an e > such that x + = x + ed and x~ =x — Ed are in P. 
Note that x = j{x + +x~). But this is a contradiction to the assumption that x is a vertex 
of P. ' □ 



8.4 Integral Polytopes 

Many combinatorial optimization problems can naturally be formulated as an integer lin- 
ear program. Such programs are in general hard to solve. However, sometimes we are 
able to derive a polyhedral description of the problem: Suppose that by relaxing the in- 
tegrality constraints of the IP formulation of the optimization we obtain a linear program 
whose feasible region is an integral polyhedron. We can then solve the optimization prob- 
lem in polynomial time simply by computing an optimal solution to the LP, e.g., by using 
Khachiyan's ellipsoid method. An important question in this context is therefore whether 
a resulting polyhedron is integral. Proving integrality of polyhedra is often a difficult task. 
We next consider a technique that facilitates showing that a polyhedron is integral. 

Subsequently, we concentrate on rational polyhedra, i.e., polyhedra that are defined by 
rational linear inequalities. A rational polyhedron P is called integral if every non-empty 
face F of P contains an integral vector. Clearly, it suffices to show that every minimal 
face of P is integral because every face contains a minimal face. Note that if P is pointed 
then this is equivalent to showing that every vertex of P is integral. 

Lemma 8.4. Let B £ 27" xm be an invertible matrix. Then B~ x b is integral for every 
integral vector b if and only if&et(B) — ±1. 

Proof. Suppose det(B) = ±1. By Cramer's Rule, B~ l is integral, which implies thatB -1 /? 
is integral for every integral b. Conversely, suppose B~ l b is integral for every integral 
vector b. Then also B l ej is integral for all i £ { 1 , . . . , m}, where <?,■ is the z'th unit vector. 
As a consequence, B^ 1 is integral. Thus, det(B) and det(B _1 ) are both integers. This in 
combination with det(B)det(B _1 ) = 1 implies that det(B) = ±1. □ 

A matrix A is totally unimodular if every square submatrix of A has determinant 0, 1 or 
— 1 . Clearly, every entry in a totally unimodular matrix is 0, 1 or — 1 . 
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Theorem 8.6. Let A £ Z mx " be a totally unimodular matrix and let b £ Z m . Then the 
polyhedron P — {x \ Ax < b} is integral. 

Proof. Let F be a minimal face of P. Then F — {x \ A°x — b°} for some subsystem 
A°x < b° of Ax < b and A° has full row rank. By reordering the columns of A° we may 
write A° as (B N), where B is a basis of A°. Because A is totally unimodular and B is a 
basis, det(fi) = ±1. By Lemma 8.4, it follows that x = { B Q h ° ) is an integral vector in 
F. □ 

Let A £ K mx " be a matrix of full row rank. A basis B of A is a non-singular submatrix of 
A of order m. A matrix A of full row rank is unimodular if A is integral and each basis B 
of A has det(fi) =±1. 

Theorem 8.7. Let A £ Z mx " be a matrix of full row rank. Then the polyhedron P = 
{x | Ax = b, x > 0} is integral for every vector b £ Z m if and only if A is unimodular. 

Proof. Suppose A is unimodular. Let b £ Z m and let x be a vertex of P. (Note that the non- 
negativity constraint ensures that P has vertices.) Then there are n linearly independent 
constraints satisfied by x with equality. The columns of A corresponding to non-zero 
entries of x are linearly independent. We can extend these columns to a basis B of A. 
Note that det(B) = ±1 because A is unimodular. Then x restricted to the coordinates 
corresponding to B is B b, which is integral by Lemma 8.4. The remaining entries of x 
are zero. Thus, x is integral. 

Assume that P is integral for every integer vector b. Let B be a basis of A. We need 
to show that det(B) = ±1. By Lemma 8.4, it suffices to show that B v is integral for 
every integral vector v. Let v be an integral vector. Let y be an integral vector such that 
z = y+B~ 1 v > and let b = Bz = B(y + B~ l v) =By + v. Note that b is integral. By 
adding zero components to z, we obtain a vector z' £ Z" such that Az' = Bz = b. Then z' 
is a vertex of {x | Ax = b, x > 0}, because z' is in the polyhedron and satisfies n linearly 
independent constraints with equality: the m equations Ax = b and the n — m equations 
Xi = for the columns i outside of B. So z' is integral and thus B~ l v = z— y is integral. □ 

Theorem 8.8. Let A £ Z" IX ". The polyhedron P = {x | Ax < b, x > 0} is integral for 
every vector b £ II" if and only if A is totally unimodular. 

Proof. It is not hard to show that A is totally unimodular if and only if (A /) is unimodular, 
where / is the m x m identity matrix. By Theorem 8.7, (A /) is unimodular if and only 
if P' = {z | (A I)z — b, z > 0} is integral for every fc € Z". The latter is equivalent to 
P = {x | Ax < b, x > 0} being integral for every b £ 11" . □ 

8.5 Example Applications 

Theorem 8.9. A matrix A is totally unimodular if 
1. each entry is 0, 1 or — 1; 
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2. each column contains at most two non-zeros; 

3. the set N of row indices of A can be partitioned into N\ UN2 so that in each column 
j with two non-zeros we have J^ieNi a i-j — T.ieN 2 a hi- 

Proof. Suppose that A is not totally unimodular. Let t be the smallest integer such that B 
is a t x t square submatrix of A with det(B) ^ {—1,0, 1}. Suppose B has a column with 
a single non-zero entry, say b^j. By expanding the determinant along row j (Laplace 
expansion), we obtain 

det(fi) = £| \ : 'b,,M,, = (-l) k+ Jb kJ M kJ 

where M; j is the minor defined as the determinant of the submatrix obtained by removing 
row i and column j from B. By (1), b k j E { — 1,0, 1} and because det(B) ^ {—1,0, 1}, 
Mkj ^ {—1,0, 1}, which is a contradiction to the choice of B. By (2), every column of 
B must therefore contain exactly two non-zero entries. By (3), adding up the rows of B 
(Ni with positive sign, N2 with negative sign) yields the zero vector. The row vectors are 
therefore linearly dependent and thus det(B) = 0, which is a contradiction. □ 

The incidence matrix A = (a lLe ) of an undirected graph G = (V,E) is an n x m matrix 
(n = \V I and m — \E\) such that for every u £ V and e E E: 

Jl ife=(u,v)eE 
10 otherwise. 

The incidence matrix A = (a lLe ) of a directed graph G = (V,E) is an n x m matrix such 
that for every u E V and e EE: 

{1 ife = 0,v)€£ 
-1 ife = (v,u)€E 
otherwise. 

The following corollary follows immediately from Theorem 8.9. 

Corollary 8.1. If A is an incidence matrix of an undirected bipartite graph or an inci- 
dence matrix of a directed graph, then A is totally unimodular. 

Proof. The proof follows from Theorem 8.9 by choosing A^i = Vb and = V\ in the 
bipartite case (where V = Vq U Vi) and Ni =V and N2 = in the directed case. □ 

Recall that a node cover of an undirected graph G = (V,E) is a subset C C V such that for 
every edge e = (u,v) at least one of the endpoints is in C, i.e., {«,v}nC^0. Let v(G) be 
the size of a maximum matching of G and let t(G) be the size of a minimum node cover 
of G. 
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The size of a maximum matching can be formulated as an integer program: 
v(G) = maximize 

eeE 

subject to x e < 1 Vm G V 

e—(u,v)GE 

x e G {0,1} VeG£ 
Equivalently, we can write this IP in a more compact way: 

v(G) = {l T x | Ax< l,x> 0, x G Z'"}, (13) 
where A G Z" xm is the incidence matrix of G with n = \V | and m = \E\. 
Similarly, the size of a minimum node cover can be expressed as 

t(G) = minimize ^ y u 

uev 

subject to y u +y v > 1 V(u,v)EE 
y u G {0,1} VmgV 

or, equivalently, 

T (G) = {y T l | A T y > 1, y > 0, y G Z"} (14) 

Theorem 8.10. Lef G = (V,E) be a bipartite graph. The size of a maximum matching of 
G is equal to the size of a minimum node cover of G, i.e., v(G) = t(G). 

Proof. Let A G Z" xm be the incidence matrix of G with n = \V\ and m = \E\. As observed 
above, we can express v(G) and t(G) by the two integer linear programs (13) and (14). 
Consider the respective LP relaxations of (13) and (14): 

v'(G) = {l T x | Ax< 1, jc>0} (15) 
T'(G) = {y T l | A T y> l,y>0} (16) 

Note that both LPs are feasible. Because A is totally unimodular, both LPs have integral 
optimal solutions and thus v(G) = v'(G) and t(G) = t'(G). Finally, observe that (16) is 
the dual of (15). By strong duality, v'(G) = 'r'(G), which proves the claim. □ 

A matrix A is called an interval matrix if every entry of A is either or 1 and the the 1 's 
of each row appear consecutively (without interfering zeros). 

Theorem 8.11. Each interval matrix A is totally unimodular. 
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Proof. Let B be a t x t submatrix of A. Define a t x t matrix N as follows: 



/I 




N = 



1 



\ 






\o o 








1 





-1 

1 / 



Note that det(N) = 1. Consider ATS 7 . Then AfB r is a submatrix of an incident matrix 
of some directed graph. (Think about it!) Therefore, NB T is totally unimodular. We 
conclude 

det(fi) ^det(A?)det(B 7 ') =det(jVfl r ) G {-1,0,1}. 



□ 
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9. Complexity Theory 



9.1 Introduction 

The problems that we have considered in this course so far are all solvable efficiently. 
This means that we were always able to design an algorithm for the respective optimiza- 
tion problem that solves every instance in time that is polynomially bounded in the size 
of the instance. For example, we have seen that every instance of the minimum spanning 
tree problem with n vertices and m edges can be solved in time 0(m + nlogn). Unfortu- 
nately, for many natural and fundamental optimization problems efficient algorithms are 
not known to exist. A well-known example of such a problem is the traveling salesman 
problem. 

Traveling Salesman Problem (TSP): 

Given: An undirected graph G — (V,E) and non-negative distances d : E — > Z + 
on the edges. 

Goal: Find a tour that visits every vertex of G exactly once (starting and ending 
in the same vertex) and has minimum total length. 

Despite 50 years of intensive research, no efficient algorithm has been found for the TSP 
problem. On the other hand, researchers have also not been able to disprove the existence 
of such algorithms. Roughly speaking, complexity theory aims to answer the question 
if the research community has been too stupid or unlucky to find efficient algorithms 
for optimization problems such as the TSP problem, or that these problem are in fact 
intrinsically more difficult than other problems. It provides a mathematical framework to 
separate problems that are computationally hard to solve from the ones that are efficiently 
solvable. 

In complexity theory one usually considers decision problems instead of optimization 
problems. 

Definition 9.1. A decision problem II is given by a set of instances I. Each instance 
/el specifies 

• a set J 7 of feasible solutions for /; 

• a cost function c : T — > Z; 

• an integer K. 

Given an instance / = {F.c.K) e I, the goal is to decide whether there exists a feasible 
solution S G T whose cost c(S) is at most K. If there is such a solution, we say that / is a 
"yes-instance"; otherwise, / is a "no-instance". 

Example 9.1. The decision problem of the TSP problem is to determine whether for a 
given instance / = (G,d,K) € I there exists a tour in G of total length at most K. 

Many decision problems can naturally be described without the need of introducing a cost 
function c and a parameter K. Some examples are the following ones. 
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Prime: 

Given: A natural number n. 

Goal: Determine whether n is a prime. 

Graph Connectedness: 

Given: An undirected graph G — (V,E). 
Goal: Determine whether G is connected. 

Hamiltonian Cycle: 

Given: An undirected graph G = (V,E). 

Goal: Determine whether G has a Hamiltonian cycle. 

Subsequently, we will mostly focus on decision problems. For notational convenience we 
will use the same naming as for the respective optimization counterparts (e.g., TSP will 
refer to the decision problem of TSP); no confusion should arise from this. 

Recall that an algorithm ALG for a problem n is said to be efficient if it solves every 
instance / e I of II in time that is bounded by a polynomial function of the size of /. 
It is not hard to see that the decision version of an optimization problem is easier than 
the optimization problem itself. But in most cases, an efficient algorithm for solving 
the decision version can also be turned into an efficient algorithm for the optimization 
problem (e.g., by using binary search on the possible optimal value). 

9.2 Complexity Classes P and NP 

Intuitively, the complexity classes P and NP refer to decision problems that can be solved 
efficiently and those for which yes-instances can be verified efficiently, respectively. If we 
insisted on formal correctness here, we would define these classes in terms of a specific 
computer model called Turing machines. However, this is beyond the scope of this course 
and we therefore take the freedom to introduce these classes using a more high-level (but 
essentially equivalent) point of view. 

We define the complexity class P (which stands for polynomial-time). 

Definition 9.2. A decision problem II belongs to the complexity class P if there exists 
an algorithm that for every instance I El determines in polynomial time whether / is a 
yes-instance or a no-instance. 

All problems that we have treated so far in this course belong to this class. But also the lin- 
ear programming problem (LP) belongs to this class, even though the simplex algorithm 
is not a polynomial-time algorithm for LP (the interested reader is referred to Section 8.6 
in [6]). The simplex algorithm works almost always very fast in practice for any LP of 
whatever size, but as mentioned before the running time of an algorithm is determined by 
its worst-case running time. For most pivoting rules devised for the simplex algorithm, 
there have been constructed instances on which the algorithm has to visit an exponential 
number of basic feasible solutions in order to arrive at an optimal one. A polynomial-time 
algorithm for LP is the ellipsoid method (the interested reader is referred to Section 8.7 in 
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[6]). This algorithm is an example where the time bound is polynomial in the logarithm 
of the largest coefficient in the instance next to the number of variables and number of 
restrictions. One of the most interesting open research questions in Operations Research 
is whether there exists an algorithm for LP whose running time is a polynomial in the 
number of variables and the number of restrictions only. 

Next we define the complexity class NP. NP does not stand for "non-polynomial-time" 
as one might guess, but for "non-deterministic polynomial-time" because this class is 
formally defined in terms of non-deterministic Turing machines. 

Given a yes-instance / 6 I of a decision problem II, we say that S is a certificate for 
/ if S € T and c[S) < K. Note that every yes-instance / must have a certificate. The 
specialty of a problem in NP is that yes-instances admit certificates that can be verified in 
polynomial time. 

Definition 9.3. A decision problem II belongs to the complexity class NP if every yes- 
instance I El admits a certificate whose validity can be verified in polynomial time. 

Note that the polynomial-time verifiability of S implies that the size of S must be poly- 
nomially bounded in |/| (because we need to look at S to verify its validity). That is, the 
definition above also states that yes-instances of problems in NP have short, i.e., polyno- 
mially bounded, certificates. 

We consider some examples: 

Example 9.2. The Hamiltonian cycle problem is in NP: A certificate for a yes-instance 
corresponds to a set of edges S CE. One can verify in 0(n) time whether S constitutes a 
cycle in G that visits all vertices exactly once. 

Example 9.3. Consider the decision variant of the linear programming problem: 
Linear Programming Problem (LP): 

Given: A set T of feasible solutions x — (x\ , . . . ,x n ) defined by m linear constraints 

T = ^(xi, . . . ,x n ) € K>o : E a iJ x i — f° r ever y j — 1 , • ■ ■ 

together with an objective function c(x) = YH=\ c i x i an d a parameter K. 
Goal: Determine whether there exists a feasible solution x E T that satisfies 
c(x) < K. 

LP is in NP: A certificate for a yes-instance corresponds to a solution x = (x\,...,x n ). We 
need 0(n) time to verify each of the m constraints and 0(n) time to compute the objective 
function value c(x). The total time needed to check whether x € T and c(x) < K is thus 
0{nm). 
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9.3 Polynomial-time Reductions and NP-completeness 



After thinking for a little while, we conclude that P C NP. Several decades of intensive 
research seem to suggest that there are problems in NP that are intrinsically more difficult 
than the ones in P and thus P ^ NP: Despite the many research efforts, no polynomial- 
time algorithms have been found for problems in NP such as TSP, Hamiltonian cycle, 
Steiner tree, etc. On the other hand, all attempts to show that these problems are in fact 
harder than the ones in P failed as well. The question whether P ^ NP is one of the 
biggest mysteries in mathematics to date and constitutes one of the seven millennium- 
prize problems; see http://www.claymath.org/millenniumfor more information. 

Complexity theory attempts to give theoretical evidence to the conjecture that P ^ NP. It 
defines within the complexity class NP a subclass of most difficult problems, the so-called 
NP-complete problems. This subclass is defined in such a way that if for any of the NP- 
complete problems there will ever be found a polynomial-time algorithm then this implies 
that for every problem in NP there exists a polynomial-time algorithm, and thus P = NP. 
The definition of this class crucially relies on the notion of polynomial-time reductions: 

Definition 9.4. A polynomial-time reduction from a decision problem II i to a decision 
problem II2 is a function <p : I\ — > X% that maps every instance l\ £l\ of II 1 to an instance 
I 2 = (p{h) € I2 of n 2 such that: 

1 . the mapping can be done in time that is polynomially bounded in the size of I\ ; 

2. I\ is a yes-instance of II 1 if and only if h is a yes-instance of II2. 

If there exist such a polynomial-time reduction from IT to II2 then we say that III can be 
reduced to YI2, and we will write II 1 < II2. 

Lets think about some consequences of the above definition in terms of polynomial-time 
computability. Suppose III ^ II2. Then II2 is more difficult to solve than IT (which also 
justifies the use of the symbol ^). To see this, note that every polynomial-time algorithm 
ALG2 for II2 can be used to derive a polynomial-time algorithm ALGi for III as follows: 

1 . Transform the instance I\ of II 1 to a corresponding instance h — (p {h ) of DI2. 

2. Run ALG2 on I2 and report that I\ is a yes-instance if and only if ALG2 concluded 
that I2 is a yes-instance. 

By the first property of Definition 9.4, the transformation in Step 1 above takes time poly- 
nomial in the size n \ = \h \ of I\ . As a consequence, the size «2 = \h | of h is polynomially 
bounded in n\. (Think about it!) In Step 2, ALG2 solves I2 in time polynomial in the size 
ri2 of h, which is polynomial in the size tii? The overall time needed by ALGi to output 
a solution for I\ is thus bounded by a polynomial in n\. Note that the second property of 
Definition 9.4 ensures that ALGi correctly identifies whether l\ is a yes-instance or not. 

Observe the existence of a polynomial-time algorithm for II 1 has in general no implica- 
tions for the existence of a polynomial-time algorithm for II2, even if we assume that we 
can compute the inverse of <p efficiently. The reason for that is that <p is not necessarily a 
one-to-one mapping and may thus map the instances of IT to a subset of the instances of 

3 0bserve that we exploit here that if p\,P2 are polynomial functions in n then pi(pi («)) is a polynomial 
function in n. 
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112 which correspond to easy instances of II2. Thus, being able to efficiently solve every 
instance of IT reveals nothing about the problem of solving IL;. 

It is not hard to show that polynomial-time reductions are transitive: 
Lemma 9.1. IfXl\ < U 2 and U 2 r< II3 then IL ^ n 3 . 
We can now define the class of NP-complete problems. 

Definition 9.5. A decision problem II is NP-complete if 

1. n belongs to NP; 

2. every problem in NP is polynomial-time reducible to IT. 

Intuitively, the above definition states that an Aff'-complete problem is as difficult as any 
other problem in NP. The above definition may not seem very helpful at first sight: How 
do we prove that every problem in NP is polynomial-time reducible to the problem II we 
are interested in? Lets assume for the time being that there are some problems that are 
known to be A^P-complete. In order to prove Af-completeness of II it is then sufficient to 
show that II is in NP and that some A^P-complete problem is polynomial-time reducible 
to II. (Think about it!) That is, showing A^P-completeness of a problem becomes much 
easier now because we "just" need to find an appropriate W-complete problem that can 
be reduced to it. Nevertheless, we remark that the reductions of many A^P-completeness 
proofs are highly non-trivial and often require a deep understanding of the structural prop- 
erties of the problem. 

The class NP has a very precise definition in terms of executions of non-deterministic 
Turning machines (which we skipped and persist in skipping), which enabled Steven 
Cook in 1974 to prove that any such execution can be reduced to an instance of a fa- 
mous problem in Boolean logic called the satisfiability problem (SAT) (stated below). 
Thus, Cook provided us with a problem that is A^P-complete. Starting from this, many 
other problems were proven to be iVP-complete. 

In a way, proving that a problem is A^P-complete is a beautiful way of stating: 4 




"I can't find an efficient algorithm, but neither can all these famous people." 

4 The illustration is taken from the book [4], which is an excellent book on the complexity of algorithms 
containing many fundamental A'P-completeness proofs. 
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Figure 16: Reductions to proof A^P-completeness of the example problems considered 
here; proofs are given for the ones marked with (*). 



9.4 Examples of NP-completeness Proofs 

We introduce some more problems and show that they are NP-complete. However, most 
of the reductions are technically involved and will be omitted here because the intention 
is to gain some basic understanding of the proof methodology rather than diving into 
technical details. 

We first introduce the satisfiability problem for which Cook established A^P-completeness. 
The basic ingredients are variables. A variable reflects an expression which can be TRUE 
or FALSE. For example, 

X\ = Koen is taller than Michael and X2 — Soup is always eaten with a fork. 

A variable can also occur negated. For example, we write -oci to express that Koen is 
not taller than Michael. A literal refers to a negated or unnegated variable. We compose 
more complicated expressions, called clauses, from literals. An example of a clause is 



The interpretation is that clause C\ is TRUE if and only if x\ is TRUE or (indicated by V) 
not-X2 is TRUE or x 3 is TRUE or x 4 is TRUE. That is, a clause is TRUE if at least one of 
its literals is TRUE. An instance of the SAT problem is a Boolean formula F in so-called 
conjunctive normal form ( CNF): 



where each C, is a clause. F is TRUE if C\ is TRUE and (indicated by A) C2 is TRUE and 
. . . and C m is TRUE, i.e., if all its clauses are TRUE. 

Satisfiability Problem (SAT): 
Given: A Boolean formula F in CNF. 



C\ = (X\ V ^X2 VX3 VJC4). 



F = Ci A C 2 A . . . A C, 



m j 



Goal: 



Determine whether there is a TRUE/FALSE-assignment to the variables 
such that F is TRUE. 
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Theorem 9.1. SAT is NP-complete. 

The proof is involved and skipped here (the interested reader is referred to Section 15.5 
in [6].) 

The following restriction of satisfiability is also iVP-complete. 
3-Satisfiability Problem (3- SAT): 

Given: A Boolean formula F in CNF with each clause consisting of 3 literals. 
Goal: Determine whether there is a TRUE/FALSE-assignment to the variables 
such that F is TRUE. 

Theorem 9.2. 3-SAT is NP-complete. 

The proof reduces SAT to 3-SAT. We refer the reader to [6, Theorem 15.2]. 

We introduce some more problems and give some examples of Af-completeness proofs. 

Let G = (V,E) be an undirected graph. We need the following definitions: A clique of G 
is a subset V' of the vertices that induces a complete subgraph, i.e., for every two vertices 
u, v e V', (u, v) e E. An independent set of G is a subset V' of vertices such that no two 
of them are incident to the same edge, i.e., for every two vertices u, v G V', (u, v) ^ E. A 
vertex cover of G is a subset V' of vertices such that every edge has at least one of its two 
incident vertices in V', i.e., for every edge (u,v) £ E, {u,v} 

Clique: 

Given: An undirected graph G = (V,E) and an integer K. 
Goal: Determine whether G contains a clique of size at least K. 

Independent Set: 

Given: An undirected graph G = (V,E) and an integer K. 

Goal: Determine whether G contains an independent set of size at least K. 

Vertex Cover: 

Given: An undirected graph G = (V,E) and an integer K. 

Goal: Determine whether G contains a vertex cover of size at most K. 

Theorem 9.3. Vertex cover is NP-complete. 

Proof. We first argue that vertex cover is in NP. A certificate of a yes-instance is a subset 
V' C V of vertices with \V'\ < K that forms a vertex cover of G — (V,E). This can be 
verified in time at most 0(n + m) by checking whether each edge (k, v) € E has at least 
one of its incident vertices in V'. 

In order to prove that vertex cover is W-complete, we will show that 3-SAT -< 
vertex cover. Note that this is sufficient because 3-SAT is iVP-complete. 

We transform an instance of 3-SAT to an instance of vertex cover as follows: Consider a 
Boolean formula F in CNF with each clause having three literals. Let n and m denote the 
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Figure 17: Illustration of the construction in the proof of Theorem 9.3 for the formula 

F — {x \ V X2 V -1X3) A (-aci Vx2 VX4) A (~^X2 VX3 VJC4). The red vertices constitute a vertex 
cover of size K = n + 2m =10. 



number of variables and clauses of F, respectively. We create a variable-gadget for each 
variable x consisting of two vertices x and -^x that are connected by an edge. Moreover, 
we create a clause-gadget for each clause C = (Zi V fa V Z3) consisting of three vertices 
h,h,h that are connected by a triangle. Finally, we connect each vertex representing a 
literal in a clause-gadget to the corresponding vertex representing the same literal in the 
variable-gadget. Let G = (V,E) be the resulting graph; see Figure 17 for an example. 
Note that this transformation can be done in polynomial time. 

We show that F is satisfiable if and only if G has a vertex cover of size at most K = n + 2m. 
First note that every assignment satisfying F can be turned into a vertex cover of size 
K: For each variable-gadget we pick the vertex that corresponds to the literal which is 
TRUE. This covers all edges in the variable-gadgets and their respective connections to the 
clause-gadgets. For each clause-gadget we choose two additional vertices so as to ensure 
that all remaining edges are covered. The resulting vertex cover has size K = n + 2m 
as claimed. Next suppose that we are given a vertex cover V' of G of size at most K. 
Note that every vertex cover has to pick at least one vertex for every variable-gadget and 
two vertices for each clause-gadget just to cover all edges inside these gadgets. Thus, 
V' contains exactly K vertices. The vertices in V' now naturally induce an assignment 
as described above that satisfies F. We conclude that yes-instances correspond under the 
above reduction, which completes the proof. □ 

Theorem 9.4. Clique is NP-complete. 

Proof. We first argue that clique is in NP. A certificate for a yes-instance is a subset V' 
of vertices that forms a clique. To verify this, we just need to check that there is an edge 
between every pair of vertices in V'. This can be done in 0(n + m) time. 

We prove that vertex cover < clique in order to establish /VP-completeness of clique. We 
need the notion of a complement graph for this reduction. Given a graph G = (V,E), the 
complement graph of G is defined as the graph G = (V,E) with (u,v) 6 E if and only if 

(u,v)iE. 

Given an instance G=(V,E) with parameter K of vertex cover, we create the complement 
graph of G and let G with parameter n — K be the respective instance of clique. Note that 
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this mapping can be done in polynomial time by adding an edge (u,v) to E for every pair 
of vertices u,v € V with (u,v) ^ E. This takes at most 0{n 2 ) time. 

It remains to show that yes-instances correspond. We claim that V' is a vertex cover in G 
if and only if V \ V' is a clique in G. V' is a vertex cover in G if and only if every edge 
(u, v) G E has not both its endpoints in V \ V', or, equivalently, every edge (u, v) ^ E has 
not both its endpoints in V \ V'. The latter statement holds if and only if for every pair 
of vertices «,vGV\V' there exists an edge (u, v) e E in G, which is equivalent to V \ V' 
being a clique of G. This proves the claim. We conclude that V' is a vertex cover of size 
K in G if and only if V \ V' is a clique of size n — in G. □ 

Theorem 9.5. Independent set is NP-complete. 

Proof. We first argue that independent set is in NP. A certificate for a yes-instance is a 
subset V' of vertices that forms an independent set. To verify this, we just need to check 
that there is no edge between every pair of vertices in V' . This can be done in 0(n + m) 
time. 

We prove that clique < independent set in order to establish Af-completeness of indepen- 
dent set. We need the notion of a complement graph for this reduction. Given a graph 
G = {V,E), the complement graph of G is defined as the graph G = (V,E) with (u, v) e E 
if and only if (u, v) ^ E. 

Given an instance G = (V,E) with parameter of clique, we create the complement graph 
of G and let G with parameter be the respective instance of independent set. Note that 
this mapping can be done in polynomial time by adding an edge (u, v) to E for every pair 
of vertices u, v 6 V with (m,v) ^ £. This takes at most 0{n 2 ) time. 

It remains to show that yes-instances correspond. We claim that V' is a clique of G if and 
only if V' is an independent set of G. Note that V' is a clique of G if and only if for each 
pair of vertices in V 1 there is an edge in E. The latter is true if and only if for each pair of 
vertices in V' there is no edge in E, which is equivalent to V' being an independent set of 
G. This proves the claim. We conclude that V' is a clique of size K in G if and only if V 1 
is an independent set of size K in G. □ 

Theorem 9.6. Hamiltonian cycle is NP-complete. 

The proof follows by reducing 3-SAT to Hamiltonian cycle. The reader is referred to [6, 
Theorem 15.6]. 

Theorem 9.7. TSP is NP-complete. 

Proof. We argued before that TSP is in NP. The proof now follows trivially because 
Hamiltonian cycle is a special case of TSP: Given an instance G = (V,E) of Hamiltonian 
cycle we construct an instance of TSP as follows: Let G' = (V,E ! ) be the complete graph 
on V and define d e — 1 if e e £ and d e = 2 otherwise. Now a tour in G' of length at most 
K = n relates to a Hamiltonian cycle in G and vice versa. □ 
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The above proof actually shows that the restriction of TSP in which all distances are either 
1 or 2 is ^VP-complete. Because TSP is in NP and it is a generalization of this problem, 
W-completeness of TSP follows immediately. The same holds true for satisfiability: 
If we would not know that it is A^P-complete but we would know that 3-SAT is NP- 
complete, then A^P-completeness of SAT followed automatically using the fact that 3- 
SAT is a special case of SAT. While restrictions can create an easier subclass of problem 
instances, generalizations always create more difficult problems. This gives sometimes 
easy ways to show NP-completeness of problems. 



We list some more A^P-complete problems (without proof) that are often used in reduc- 
tions. 



2- Partition: 

Given: 
Goal: 

3- Partition : 

Given: 
Goal: 

Set Cover.: 

Given: 

Goal: 



Integers ii,..., i„. 

Decide whether there is a set S C { 1 , . . . , n} such that £,- G5 s; = \ YJl=\ s i- 

Rational numbers s\ , . . . ,*3„ with i < s, < A for every i = 1 , . . . , 3n. 
Determine whether the set { 1 , . . . , 3n} can be partitioned into n triplets 
Si,...,S„ such that Y,ieSi a i = 1 f° r every k = 



A universe U = {l,...,ra} of n elements, a family of m subsets 
Si , • • • , S m C U and an integer K. 

Determine whether there is a selection of at most K subsets such that 
their union is U. 



9.5 More on Complexity Theory 
9.5.1 W-hard Problems 

Sometimes we may be unable to prove that a problem II is in NP but nevertheless can 
show that all problems in NP are reducible to II. According to Definition 9.5, II does 
not qualify to be an A^P-complete problem because it is not in NP. Yet, II is as hard as 
any other problem in NP and thus probably a difficult problem. We call such problem 
NP-hard. An example of such a problem is the Lth heaviest subset problem: 

Lth Heaviest Subset Problem: 

Given: Integers w\ , . . . , w„,L and a parameter K. 

Goal: Determine whether the weight of the Lth heaviest subset of {1, . . . ,n} 
is at least K. (Formally, determine whether there are L distinct subsets 
Si, . . . ,Sl Q { 1 , . . . , n} such that w(S,-) = Y*jeSi w j — K for every i — 
1 /••) 

It can be proven that all problems in NP are polynomial-time reducible to the Lth Heaviest 
Subset problem (see [6, Theorem 16.8]). However, a proof that short certificates exist for 
yes-instance is non-existent. How else could we provide a certificate for a yes-instance 
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other than explicitly listing L subsets that are heavier than Kl (Note that this is not a short 
certificate because L can be exponential in n.) 

9.5.2 Complexity Class co-NP 

Another complexity class that is related to NP is the class co-NP (which stands for com- 
plement ofNP). Here one considers complements of decision problems. As an example, 
the complement of Hamiltonian cycle reads as follows: 

Hamilton Cycle Complement: 

Given: An undirected graph G = (V,E). 

Goal: Determine whether G does not contain a Hamiltonian cycle. 
There are no short certificates known for yes-instances of this problem. 

Definition 9.6. A decision problem II belongs to the class co-NP if and only if its com- 
plement belongs to the class NP. Said differently, a decision problem belongs to co-NP if 
every no-instance / e 2" admits a certificate whose validity can be verified in polynomial 
time. 

It is not hard to see that every problem in P also belongs to co-NP. Thus, P C NP(lco-NP. 
Similar to the P ^ NP conjecture, it is widely believed that NP ^ co-NP. 

Theorem 9.8. If the complement of an NP-complete problem is in NP, then NP = co-NP. 

Proof. Assume that the complement II2 of an NP-complete problem 1I2 is is in NP. We 
will show that the complement fli of an arbitrary problem 111 e NP is also in NP thus 
showing that NP = co-NP. 

Because II 2 is A^P-complete, we know that 111 is polynomial-time reducible to 1I2. Note 
that the reduction <p from 111 to 1I2 is also a polynomial-time reduction from fli to II2. 
We can therefore exhibit a short certificate for every yes-instance 7i of fli as follows: We 
first transform 7i to I2 = <p{h) an d then use the short certificate for the yes-instance I2 
(which must exist because II2 € NP). We conclude that fli is in NP which finishes the 
proof. □ 

Note that the above theorem also implies that if the complement of a problem in NP is 
also in NP then (unless NP = co-NP) this problem is not A^P-complete. Said differently, 
a problem that belongs to NP n co-NP is unlikely to be A^P-complete. As an example, 
consider the linear programming problem LP. Using duality theory, it is not hard to see 
that LP e NP n co-NP. Before LP was known to be polynomial-time solvable, it was in 
fact the above observation that gave strong evidence to the conjecture that LP e P. 

Exercise 9.1. Show that LPeNPtl co-NP. 
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9.5.3 Pseudo-polynomiality and Strong NP-completeness 



Sometimes the running time of an algorithm is polynomial in the size of the instance and 
the largest number in the input. As an example, consider the integer knapsack problem. 

Integer Knapsack Problem: 

Given: Integers c\,...,c n and a parameter K. 

Goal: Determine whether there exist integers x\,...,x n such that Y!k=\ c k x k ~ 
K. 

This problem can be solved as follows: Create a directed graph G = (V,A) with K + 1 
vertices V = {0, 1, . . . ,K} and 0{nK) arcs: 

A = | < ; < j < K and j = i + c^ for some k}. 

It is not hard to prove that an instance / of the integer knapsack problem is a yes-instance 
if and only if there exists a path from to K in G. The latter problem can be solved in 
time 0(n + m) = O(nK). This running time is not polynomial in the size of /. To see 
this, recall that we defined the size |/| of / to be the number of bits that are needed to 
represent / in binary. Making the (reasonable) assumption that c\ , . . . ,c„ < K, the size 
of / is therefore at most O(nlogK). The running time of the algorithm is therefore not 
polynomially bounded in general. However, if K is bounded by a polynomial function of 
n then the algorithm would be a polynomial-time algorithm. That is, depending on the 
application the above algorithm might indeed be considered to be reasonably efficient, 
despite the fact that the problem is NP-complete (which it is). 

The above observation gives rise to the following definition. Given an instance /, let 
num(7) refer to the largest integer appearing in /. 

Definition 9.7. An algorithm ALG for a problem II is pseudo-polynomial if it solves every 
instance / e I of II in time bounded by a polynomial function in |/| and num(/). 

Problems that remain NP-complete even if the largest integer appearing in its description 
is bounded polynomially in the size of the instance is called strongly NP-complete. 

Definition 9.8. A problem II is strongly NP-complete if the restriction of II to instances 
/€l satisfying that num(7) is polynomially bounded in |/| is Aff'-complete. 

Note that many problems that we showed to be A^P-complete do not involve any numerical 
data that is larger than the input size itself. For example, all graph problems such as 
Hamiltonian cycle, clique, independent set, vertex cover, etc. satisfy num(7) = 0(n) and 
are therefore even strongly A^P-complete by definition. In fact, also TSP is strongly NP- 
complete because we established NP-completeness even for instances with distances 1 or 
2. 

As the theorem below shows, we cannot expect to find a pseudo-polynomial algorithm for 
a strongly NP-complete problem (unless P = NP). 

Theorem 9.9. There does not exist a pseudo-polynomial algorithm for a strongly NP- 
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complete problem, unless P = NP. 



Proof. Let II be a strongly Aff'-complete problem and suppose that ALG is a pseudo- 
polynomial algorithm for IL Consider the restriction fl of n to instances I E 1 that 
satisfy that num(/) is polynomially bounded in |/|. By Definition 9.8, fl is A^P-complete. 
But ALG can solve every instance / of fl in time polynomial in |7| and num(7), which is 
polynomial in |7|. This is impossible unless P — NP. □ 

9.5.4 Complexity Class PSPACE 

The common criterion that we used to define the complexity classes P and NP was time: P 
refers to the set of problems that are solvable in polynomial time; NP contains all problems 
for which yes-instances can be verified in polynomial time. There are other complexity 
classes that focus on the criterion space instead: The complexity class PSPACE refers to 
the set of problems for which algorithms exist that only require a polynomial amount of 
space (in the size of the input). 

Definition 9.9. A decision problem n belongs to the complexity class PSPACE if there 
exists an algorithm that for every instance I G X determines whether / is a yes-instance or 
a no-instance using space that is polynomially bounded in the size of /. 

Clearly, every polynomial-time algorithm cannot consume more than polynomial space 
and thus P C PSPACE. However, even exponential-time algorithms are feasible as 
long as they only require polynomial space. We can use this observation to see that 
NP C PSPACE: Consider an arbitrary problem n in NP. We know that every yes- 
instances of II admits a short certificate. We can therefore generate all potential short 
certificates one after another and verify the validity of each one. If we encounter a valid 
certificate throughout this procedure then we report that the instance is a yes-instance; 
otherwise, we report that it is a no-instance. The algorithm may take exponential time be- 
cause the number of certificates to be checked might be exponential. However, it can be 
implemented to use only polynomial space by deleting the previously generated certificate 
each time. 

As a final remark: We actually just got a tiny glimpse of the many existing com- 
plexity classes. There is a whole "zoo" of complexity classes; see, for example, the 
wiki page http://qwiki.stanford.edu/index.php/ComplexityJZoo if you want to learn more 
about many other complexity classes and their relations. 

References 

The presentation of the material in this section is based on [6, Chapters 8, 15 & 16]. 
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10. Approximation Algorithms 



10.1 Introduction 

As we have seen, many combinatorial optimization problems are A^P-hard and thus there 
is very little hope that we will be able to develop efficient algorithms for these problems. 
Nevertheless, many of these problems are fundamental and solving them is of great im- 
portance. There are various ways to cope with these hardness results: 

1 . Exponential Algorithms: Certainly, using an algorithm whose running time is ex- 
ponential in the worst case might not be too bad after all if we only insist on 
solving instances of small to moderate size. 

2. Approximation Algorithms: Approximation algorithms are efficient algorithms 
that compute suboptimal solutions with a provable approximation guarantee. That 
is, here we insist on polynomial-time computation but relax the condition that the 
algorithm has to find an optimal solution by requiring that it computes a feasible 
solution that is "close" to optimal. 

3. Heuristics: Any approach that solves the problem without a formal guarantee on 
the quality of the solution can be considered as a heuristic for the problem. Some 
heuristics provide very good solutions in practice. An example of such an ap- 
proach is local search: Start with an arbitrary solution and perform local improve- 
ment steps until no further improvement is possible. Moreover, heuristics are often 
practically appealing because they are simple and thus easy to implement. 

We give some more remarks: 

Some algorithms might perform very well in practice even though their worst-case run- 
ning time is exponential. The simplex algorithm solving the linear programming problem 
is an example of such an algorithm. Most real-world instances do not correspond to worst- 
case instances and thus "typically" the algorithms' performance in practice is rather good. 
In a way, the worst-case running time viewpoint is overly pessimistic in this situation. 

A very successful approach to attack optimization problems originating from practical 
applications is to formulate the problem as an integer linear programming (ILP) problem 
and to solve the program by /LP-solvers such as CPLEX. Such solvers are nowadays very 
efficient and are capable to solve large instances. Constructing the right /LP-method for 
solving a given problem is a matter of smart engineering. Some /LP-problems can be 
solved by just running an ILP-solver; others can only be solved with the help of more 
sophisticated methods such as branch-and-bound, cutting-plane, column generation, etc. 
Especially rostering problems, like classes of universities or schedules of personnel in 
hospitals, are notorious for being extremely hard to solve, already for small sizes. Solving 
ILP-problems is an art that can be learned only in practice. 

Here we will focus on approximation algorithms in order to cope with A^P-hardness of 
problems. We give a formal definition of these algorithms first. 

Definition 10.1. An algorithm ALG for a minimization problem II is an a-approximation 
algorithm with a > 1 if it computes for every instance / e I in polynomial time a feasible 
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solution S E J- whose cost c(S) is at most a times the cost OPT(7) of an optimal solution 
for/, i.e., c(S) < a-OPT(7). 

The definition is similar for maximization problems. Here it is more natural to assume 
that we want to maximize a weight (or profit) function w : J- —> R that maps every feasible 
solution S E J- of an instance I El to some real value. 

Definition 10.2. An algorithm ALG for a maximization problem II is an a-approximation 
algorithm with a > 1 if it computes for every instance / E I in polynomial time a feasible 
solution S E J- whose weight (or profit) w(S) is at least -^ times the weight 0PT(7) of an 
optimal (i.e., maximum weight) solution for /, i.e., w(S) > — • OPT(7). 

Note that we would like to design approximation algorithms with the approximation ra- 
tio a being as small as possible. A lot of research in theoretical computer science and 
discrete mathematics is dedicated to the finding of "good" approximation algorithms for 
combinatorial optimization problems. 

10.2 Approximation Algorithm for Vertex Cover 

We start with an easy approximation algorithm for the vertex cover problem, which has 
been introduced before: Given a graph G = (V,E), find a vertex cover V' C V of small- 
est cardinality. Recall that we showed that the decision variant of vertex cover is NP- 
complete. 

One of the major difficulties in the design of approximation algorithms is to come up with 
a good estimate for the optimal solution cost OPT(7). (We will omit / subsequently.) 
Recall that a matching M is a subset of the edges having the property that no two edges 
share a common endpoint. We call a matching M maximum if the cardinality of M is 
maximum; we call it maximal if it is inclusion-wise maximal, i.e., we cannot add another 
edge to M without rendering it infeasible. Note that a maximum matching is a maximal 
one but not vice versa. 

Lemma 10.1. Let G = (V,E) be an undirected graph. If M is a matching of G then 
OPT> \M\. 

Proof. Consider an arbitrary vertex cover V' of G. Every matching edge (u,v) EM must 
be covered by at least one vertex in V', i.e., {u,v} DV' ^ 0. Because the edges in M do 
not share any endpoints, we have |V'| > \M\. □ 

We conclude that we can derive an easy 2-approximation algorithm for vertex cover as 
follows: 

Theorem 10.1. Algorithm 14 is a 2-approximation algorithm for vertex cover. 

Proof. Clearly, the running time of Algorithm 14 is polynomial because we can find a 
maximal matching in time at most 0(n + m). The algorithm outputs a feasible vertex 
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Input: Undirected graph G = (V,E). 
Output: Vertex cover V'CV. 

1 Find a maximal matching M of G. 

2 Output the set V' of matched vertices. 

Algorithm 14: Approximation algorithm for vertex cover. 

cover because of the maximality of M. To see this, suppose that the resulting set V' is 
not a vertex cover. Then there is an edge (u,v) with u,v £ V' and thus both u and v are 
unmatched in M. We can then add the edge (u,v) to M and obtain a feasible matching, 
which contradicts the maximality of M. Finally, observe that \V'\ = 2\M\ < 20PT by 
Lemma 10.1. □ 

Note that it suffices to compute a maximal (not necessarily maximum) matching in Algo- 
rithm 14, which can be done in linear time 0(m). 

An immediate question that comes to ones mind is whether the approximation ratio is best 
possible. This indeed involves two kinds of questions in general: 

1 . Is the approximation ratio a of the algorithm tight? 

2. Is the approximation ratio a of the algorithm best possible for vertex cover! 

The first question essentially asks whether the analysis of the approximation ratio is tight. 
This is usually answered by exhibiting an example instance for which the algorithm com- 
putes a solution whose cost is a times the optimal one. The second one asks for much 
more: Can one show that there is no approximation algorithm with approximation ratio 
a — e for every e > 0? Such an inapproximability result usually relies on some conjecture 
such as that P ^ NP. 

Lets first argue that the approximation ratio of Algorithm 14 is indeed tight. 

Example 10.1. Consider a complete bipartite graph with n vertices on each side. The 
above algorithm will pick all 2n vertices, while picking one side of the bipartition con- 
stitutes an optimal solution of cardinality n. The approximation ratio of 2 is therefore 
tight. 

The answer to the second question is not clear, despite intensive research. The currently 
best known lower bound on the inapproximability of vertex cover is as follows (stated 
without proof). 

Theorem 10.2. Vertex cover cannot be approximated within a factor of 1.3606, unless 
P=NP. 

10.3 Approximation Algorithms for TSP 

As introduced before, the traveling salesman problem asks for the computation of a short- 
est tour in a given graph G = (V,E) with non-negative edge costs c : E — > K. + . 
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We first show the following inapproximability result. 

Theorem 10.3. For any polynomial-time computable function a(n), TSP cannot be ap- 
proximated within a factor of a(n), unless P = NP. 

Proof. Suppose we have an algorithm ALG that approximates TSP within a factor a(n). 
We show that we can use ALG to decide in polynomial time whether a given graph has a 
Hamiltonian cycle or not, which is impossible unless P = NP. 

Let G — (V,E) be a given graph on n vertices. We extend G to a complete graph and 
assign each original edge a cost of 1 and every other edge a cost of na(n). Run the 
a(n)-approximation algorithm ALG on the resulting instance. We claim that G contains 
a Hamiltonian cycle if and only if the TSP tour computed by ALG has cost less than or 
equal to na(n). 

Suppose G has a Hamiltonian cycle. Then the optimal TSP tour in the extended graph 
has cost n. The approximate TSP tour computed by ALG must therefore have cost less 
than or equal to na(n). Suppose G does not contain a Hamiltonian cycle. Then every 
feasible TSP tour in the extended graph must use at least one edge of cost na(n), i.e., the 
cost of the tour is greater than na(n) (assuming that G has at least n > 2 vertices). Thus, 
the cost of the approximate TSP tour computed by ALG is greater than na(n). The claim 
follows. □ 

The above inapproximability result is extremely bad news. The situation changes if we 
consider the metric TSP problem. 

Metric Traveling Salesman Problem (Metric TSP): 

Given: An undirected complete graph G = (V,E) with non-negative costs c : 
E — >• M + satisfying the triangle inequality, i.e., for every u,v,w G V, 

Goal: Compute a tour in G that minimizes the total cost. 

The metric TSP problem remains A^P-complete: Recall that we showed that the TSP prob- 
lem is A^P-complete by reducing Hamiltonian Cycle to this problem. The reduction only 
used edge costs 1 and 2. Note that such edge costs always constitute a metric. Thus, the 
same proof shows that metric TSP is A^P-complete. 

We next derive two constant factor approximation algorithms for this problem. 

Given a subset Q C E of the edges, we define c{Q) as the total cost of all edges in Q, i.e., 

C(Q) =LeeQ c e- 

The following lemma establishes a lower bound on the optimal cost: 
Lemma 10.2. Let T be a minimum spanning tree of G. Then OPT> c(T). 

Proof. Consider an optimal TSP tour and remove an arbitrary edge from this tour. We 
obtain a spanning tree of G whose cost is at most OPT. The cost of T is thus at most 
OPT. □ 
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This lemma leads to the following idea: 5 



Input: Complete graph G = (V,E) with non-negative edge costs c : E — > R + 

satisfying the triangle inequality. 
Output: TSP tour of G. 

1 Compute a minimum spanning tree T of G. 

2 Double all edges of T to obtain a Eulerian graph G' . 

3 Extract a Eulerian tour C' from G' . 

4 Traverse C' and short-cut previously visited vertices. 

5 Output the resulting tour C. 

Algorithm 15: Approximation algorithm for metric TSP. 

Theorem 10.4. Algorithm 15 is a 2- approximation algorithm for metric TSP. 

Proof. Note that the algorithm has polynomial running time. Also, the returned tour is a 
TSP tour by construction. Because edge costs satisfy the triangle inequality, the tour C 
resulting from short-cutting the Eulerian tour C' in Algorithm 15 has cost at most 2c(T), 
where T is the minimum spanning tree computed in Step 1. By Lemma 10.2, the cost of 
Cis thus atmost20PT. □ 

We can actually derive a better approximation algorithm by refining the idea of Algo- 
rithm 15. Note that the reason for doubling the edges of a minimum spanning tree T was 
that we would like to obtain a Eulerian graph from which we can then extract a Eulerian 
tour. Are there better ways to construct a Eulerian graph starting with a minimum span- 
ning tree T1 Certainly, we only have to take care of the odd degree vertices, say V', of T. 
Note that in a tree there must be an even number of odd degree vertices. 

So one way of making these odd degree vertices become even degree vertices is to add 
the edges of a perfect matching on V 1 to T. Intuitively, we would like to keep the total 
cost of the augmented tree small and thus compute a minimum cost perfect matching. As 
the following lemma shows, the cost of this matching can be related to the optimal cost. 

Lemma 10.3. Let V' CV be a subset containing an even number of vertices. Let M be a 
minimum cost perfect matching on V'. Then OPT> 2c(M). 

Proof. Consider an optimal TSP tour C of length OPT. Traverse this tour and short-cut all 
vertices in V \ V'. Because of the triangle inequality, the resulting tour C' on V' has length 
at most OPT. C' can be seen as the union of two perfect matchings on V'. The cheaper 
matching of these two must have cost at most jOPT. We conclude that a minimum cost 
perfect matching M on V' has cost at most | OPT. □ 

We combine the above observations in the following algorithm, which is also known as 
Christofides ' algorithm. 

5 Recall that a Eulerian graph is a connected graph that has no vertices of odd degree. A Eulerian tour is a 
cycle that visits every edge of the graph exactly once. Given a Eulerian graph, we can always find a Eulerian 
tour. 
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Input: Complete graph G = (V,E) with non-negative edge costs c : E — > R + 

satisfying the triangle inequality. 
Output: TSP tour of G. 

1 Compute a minimum spanning tree T of G. 

2 Compute a perfect matching M on the odd degree vertices V' of T. 

3 Combine T and M to obtain a Eulerian graph G' . 

4 Extract a Eulerian tour C' from G' . 

5 Traverse C' and short-cut previously visited vertices. 

6 Output the resulting tour C. 

Algorithm 16: Approximation algorithm for metric TSP. 

Theorem 10.5. Algorithm 16 is a ^-approximation algorithm for metric TSP. 

Proof. Note that the algorithm can be implemented to run in polynomial time (computing 
a perfect matching in an undirected graph can be done in polynomial time). The proof 
follows because the Eulerian graph G' has total cost c(T) + c(M). Because of the triangle- 
inequality, short-cutting the Eulerian tour C' does not increase the cost. The resulting 
tour C has thus cost at most c(T) +c(M), which by Lemmas 10.2 and 10.3 is at most 
§OPT. □ 

The algorithm is tight (example omitted). Despite intensive research efforts, this is still 
the best known approximation algorithm for the metric TSP problem. 

10.4 Approximation Algorithm for Steiner Tree 

We next consider a fundamental network design problem, namely the Steiner tree prob- 
lem. It naturally generalizes the minimum spanning tree problem: 

Steiner Tree Problem : 

Given: An undirected graph G — (V,E) with non-negative edge costs c : E — >• 

M + and a set of terminal nodes R CV. 
Goal: Compute a minimum cost tree T in G that connects all terminals in R. 

The nodes in R are usually called terminals; those in V \R are called Steiner nodes. The 
Steiner tree problem thus asks for the computation of a minimum cost tree, also called 
Steiner tree, that spans all terminals in R and possibly some Steiner nodes. The decision 
variant of the problem is Af-complete. Note that if we knew the set S C V \ R of Steiner 
nodes that are included in an optimal solution, then we could simply compute an optimal 
Steiner tree by computing a minimum spanning tree on the vertex set R U S in G. Thus, 
the difficulty of the problems is that we do not know which Steiner nodes to include. 

We first show that we can restrict our attention without loss of generality to the so-called 
metric Steiner tree problem. In the metric version of the problem, we are given a com- 
plete graph G = (V,E) with non-negative edge costs c : E — > R + that satisfy the triangle 
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inequality, i.e., for every u, V, W € V, c uw < c uv + c vw . 

Given a subset Q C E of the edges, we define c(Q) as the total cost of all edges in Q, i.e., 
c{Q) = LeeQ c e- 

Theorem 10.6. There is an approximation preserving polynomial-time reduction from 
Steiner tree to metric Steiner tree. 

Proof. Consider an instance / = (G, c, R) of the Steiner tree problem consisting of a graph 
G = (V,E) and edge costs c : E — > K + . We construct a corresponding instance /' = 
(G' ,c' ,R) of metric Steiner tree as follows. Let G' = (V,E') be the complete undirected 
graph on vertex set V and let E' be its set of edges. Define the cost c' e of edge e — (u, v) g 
E' as the cost of a shortest path between u and v in G. (G',c') is called the metric closure 
of (G,c). The set of terminals in / and /' is identical. 

Suppose we are given a Steiner tree T in G. Then the cost of the Steiner tree T in (G',c') 
can only be smaller. Next suppose we are given a Steiner tree T' of (G',c'). Each edge 
e = (u,v) 6 r' corresponds to a shortest m, v-path in G. The subgraph of G induced by all 
edges in T' connects all terminals in R and has cost at most c'(T') but may contain cycles 
in general. If so, remove edges to obtain a tree T in G. Clearly, c(T) < c'(T'). □ 

In light of Theorem 10.6, we concentrate on the metric Steiner tree problem subsequently. 

As mentioned before, the key to derive good approximation algorithms for a problem is 
to develop good lower bounds on the optimal cost OPT. One such lower bound is the 
following: 

Lemma 10.4. Let T be a minimum spanning tree on the terminal set R ofG. Then OPT> 
\c{T). 

Proof. Consider an optimal Steiner tree of cost OPT. By doubling the edges of this tree, 
we obtain a Eulerian graph of cost 20 PT that connects all terminals in R and a (possibly 
empty) subset of Steiner vertices. Find a Eulerian tour C' in this graph (e.g., by traversing 
vertices in their depth-first search order). We obtain a Hamiltonian cycle C on R by 
traversing C' and short-cutting Steiner vertices and previously visited terminals. Because 
of the triangle inequality, this short-cutting will not increase the cost and the cost of C is 
thus at most c(C') = 20 PT. Delete an arbitrary edge of C to obtain a spanning tree on R 
of cost at most 20 PT. The cost of a minimum spanning tree T on R is less than or equal 
to the cost of this spanning tree, which is at most 20PT. □ 

Lemma 10.4 gives rise to the following approximation algorithm. 

Theorem 10.7. Algorithm 17 is a 2- approximation algorithm for metric Steiner tree. 

Proof. Certainly, the algorithm has polynomial running time and outputs a feasible solu- 
tion. The approximation ratio of 2 follows directly from Lemma 10.4. □ 
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Input: Complete graph G = (V,E) with non-negative edge costs c : E — ► R + 
satisfying the triangle inequality and a set of terminal vertices R C V. 
Output: Steiner tree T on R. 

1 Compute a minimum spanning tree T on terminal set R. 

2 Output T. 

Algorithm 17: Approximation algorithm for metric Steiner tree. 

The analysis of Algorithm 10.4 is tight as the following 
example shows: 

Example 10.2. Consider a complete graph that consists of 
k outer terminal vertices R = {t\, . . . that are connected 
to one inner Steiner vertex; see Figure 1 8 for an example 
with k = 8. The edges to the Steiner vertex have cost 1; 
all remaining ones have cost 2. Note that these edge costs 
satisfy the triangle inequality. A minimum spanning tree Figale 18; Example graph 
on the gray vertices has total cost 2 (k— 1) while the min- 
imum Steiner tree has cost k. That is, the approximation ratio of Algorithm 10.4 on this 
instance is 2 — 2/k. As k goes to infinity, this ratio approaches 2. 

There are much better approximation algorithms for this problem. The current best ap- 
proximation ratio is 1.386. Inapproximability results show that the problem cannot be 
approximated arbitrarily well. In particular, there is no 96/95-approximation algorithm 
for the metric Steiner tree problem. 

10.5 Approximation Scheme for Knapsack 

We next consider the knapsack problem: 
Knapsack Problem : 

Given: A set A 7 = {l,...,n} of n items with each item i e N having a profit 
Pi £ Z + and a weight wi E Z + , and a knapsack whose (weight) capacity 

isfiez+. 

Goal: Find a subset X CN of items whose total weight w(X) — Y,iex w i is a * 
most B such that the total profit p(X) = Y,iex Pi i s maximum. 

We will assume without loss of generality that w/ < B for every i € and that p, > for 
every i 6 A^; items not satisfying one of these conditions can safely be ignored. 

The knapsack problem is A^f-hard and we therefore seek a good approximation algorithm 
for the problem. As we will see, we can even derive an approximation scheme for this 
problem: 

Definition 10.3. An algorithm ALG is an approximation scheme for a maximization prob- 
lem n if for every given error parameter e > and every instance / G X, it computes a 
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feasible solution S € J- of profit p(S) > (1 — e)0PT(7). An approximation scheme ALG 
is a 

• polynomial time approximation scheme (PTAS) if for every fixed £ > its running 
time is polynomial in the size of the instance /; 

• fully polynomial time approximation scheme (FPTAS) if its running time is poly- 
nomial in the size of the instance / and |. 

Note that the running time of a PTAS might grow exponentially in j, e.g., like <9(2 1//£ n 2 ), 
while this is not feasible for a FPTAS. 

10.5.1 Dynamic Programming Approach 

The algorithm is based on the following dynamic programming approach. Let the max- 
imum profit of an item in a given instance / be denoted by P(I) (we will omit / subse- 
quently). A trivial upper bound on the total profit that any solution for / can achieve is 
nP. Define for every i E N and every p E {0, . . . ,nP}: 

A(i,p) = minimum weight of a subset S C {1, . . . ,/} whose profit p(S) is exactly p. 

Let A (i,p) = °° if no such set exists. Suppose we were able to compute A (i,p). We can 
then easily determine the total profit of an optimal solution: 

OPT = max{pe {0,...,nP} | A(n,p)<B}. 

Clearly, A{\,p{) = W\ and A(l,p) = °° for every p > with p ^ p\. We further set 
A(i,Q) = for every i E N and implicitly assume that A (i,p) = °° for every p < 0. Lets 
see how to compute A(i + l,p) for i > 1 and p > 0. There are two options: either we 
include item i + 1 into the knapsack or not. If we include item i + 1, then it contributes 
a profit of pi + \ and thus the minimum weight of a subset of { 1 ,...,; + 1 } with profit p 
is equal to the minimum weight A (i,p — of the first i items yielding profit p — pi+\ 
plus the weight w, + i of item i + 1. If we do not include item i+l, then A(i+ l,p) is equal 
to A (;,/?). Thus 

A(i+l,p)=min{w i+ i +A(i,p - p i+1 ), A(i,p)}. (17) 

We can therefore compute the table of entries A(i,p) with i E N and p E {0, . . . ,nP} in 
time 0{n 2 P). 

Note that the dynamic program has pseudo-polynomial running time (cf. Definition 9.7), 
i.e., the running time of the algorithm is polynomial in the size of the instance (here n) and 
the largest integer appearing in the instance (here P). However, we can use this pseudo- 
polynomial algorithm in combination with the following rounding idea to obtain a fully 
polynomial-time approximation scheme for this problem. 



88 



Input: A set N = { 1 , . . . , n} of items with a profit pi £ Z + and a weight Wj £ Z + for 

every item i £ N and a knapsack capacity B £ Z + . 
Output: Subset X' £-N of items. 

1 SeU=Llog 10 (f)J. 

2 Define truncated profits pi = |_p;/10 f J for every i £ N. 

3 Use the dynamic program (17) to compute an optimal solution X' for the 
knapsack instance (N, (pi), (wi),B). 

4 Output X'. 

Algorithm 18: Approximation scheme for knapsack. 
10.5.2 Deriving a FPTAS for Knapsack 

Note that the above algorithm runs in polynomial time if all profits of the items are small 
numbers, e.g., if they are polynomially bounded in n. The key idea behind deriving a 
FPTAS is to ignore a certain number (depending on the error parameter £) of least sig- 
nificant bits of the items' profits. The modified profits can be viewed as numbers that 
are polynomially bounded in |/| and ~. As a consequence, we can compute an optimal 
solution for the modified profits in time polynomial in |/| and ^ using the above dynamic 
program. Because we only ignore the least significant bits, this solution will be a (1 — e)- 
approximate solution with respect to the original profits. Subsequently, we elaborate on 
this idea in more detail. 

Suppose we truncate the last t digits of each item's profit. That is, define the truncated 
profit pi of item i as pi = [pi/ 10' \. Now use the dynamic program above to compute 
an optimal solution X' for the instance with truncated profits. This takes time at most 
0(n 2 P /10 f ). 

Certainly, X' may be sub-optimal for the original problem, but its total profit relates to the 
one of an optimal solution X for the original problem as follows: 

I>> I ufpi > E 1Qt P' ^ L (pi - io') > £ Pi -mo'. 

iex' iex' iex iex iex 

Here, the first and third inequalities hold because of the definition of truncated profits. The 
second inequality follows from the optimality of X'. Thus, the total profit of X' satisfies 

P (X>) > P (X) - nW = OPT (l - ) > OPT (l - . 

Note that the last inequality holds because OPT > P. Suppose we wish to obtain an 
approximation ratio of 1 — e. We can accomplish this by letting t be the smallest integer 
such that nlQ' /P < e, or, equivalently, 

f=[log 10 (f) . 

With this choice, the running time of the dynamic program is 0(n 2 P /10') = 0(n 3 /e). 
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That is, for any e > we have an (1 — e) -approximation algorithm whose running time is 
polynomial in the size of the instance and ~. 

We summarize the result in the following theorem. 

Theorem 10.8. Algorithm 18 is a fully polynomial time approximation scheme for the 
knapsack problem. 

References 

The presentation of the material in this section is based on [9, Chapters 1 & 3]. 
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